ePUB 是一种开放的、行业标准的电子书格式。但是,不同的阅读设备和应用程序对 ePUB 及其众多功能的支持各不相同。使用设备或应用程序设置可根据您的喜好自定义显示效果。您可以自定义的设置通常包括字体、字体大小、单列或双列、横向或纵向模式以及您可以单击或点击放大的图形。有关阅读设备或应用程序上的设置和功能的更多信息,请访问设备制造商的网站。
ePUB is an open, industry-standard format for eBooks. However, support of ePUB and its many features varies across reading devices and applications. Use your device or app settings to customize the presentation to your liking. Settings that you can customize often include font, font size, single or double column, landscape or portrait mode, and figures that you can click or tap to enlarge. For additional information about the settings and features on your reading device or app, visit the device manufacturer’s Web site.
许多书目都包含编程代码或配置示例。为了优化这些元素的呈现效果,请以单列、横向模式查看电子书,并将字体大小调整为最小设置。除了以可重排文本格式呈现代码和配置外,我们还提供了模仿印刷书中呈现的代码图像;因此,如果可重排格式可能会影响代码列表的呈现效果,您将看到“单击此处查看代码图像”链接。单击该链接可查看打印保真度代码图像。要返回上一页,请单击设备或应用程序上的“返回”按钮。
Many titles include programming code or configuration examples. To optimize the presentation of these elements, view the eBook in single-column, landscape mode and adjust the font size to the smallest setting. In addition to presenting code and configurations in the reflowable text format, we have included images of the code that mimic the presentation found in the print book; therefore, where the reflowable format may compromise the presentation of the code listing, you will see a “Click here to view code image” link. Click the link to view the print-fidelity code image. To return to the previous page viewed, click the Back button on your device or app.
“我参加了建筑师大师班和项目设计大师班。在参加这两门课程之前,我几乎已经失去了所有希望,无法弄清楚为什么我的团队的努力从未取得成功,我一直在努力寻找一个可行的解决方案来阻止我们正在进行的疯狂死亡行军。大师班让我看到了一个世界,软件开发被提升到所有其他工程学科的水平,并以专业、可预测和可靠的方式进行,从而按时并在预算内开发出高质量的工作软件。获得的知识是无价的!从揭示如何创建一个坚固而健全的架构,以承受不断变化的用户需求,到如何规划和指导项目成功结束的复杂细节——所有这些都以难以匹敌的专业知识和专业精神呈现。考虑到 Juval 在课堂上与我们分享的每一点精炼真理都是在现实生活中获得、测试和证明的,它将这种学习经验转化为强大的知识体系,对于任何渴望成为软件架构师的人来说都是绝对必要的。”
“I attended both the Architect’s Master Class and the Project Design Master Class. Before these two classes I had almost lost all hope of ever being able to figure out why the efforts of my team were never leading to a successful end, and I was struggling to find a working solution to stop the insane death march we were on. The Master Classes opened my eyes to a world where software development is elevated to the level of all other engineering disciplines and is conducted in a professional, predictable, and reliable manner, resulting in high-quality working software developed on time and within budget. The knowledge gained is priceless! From revealing how to create a solid and sound architecture, which withstands ever-changing user requirements, to the intricate details on how to plan and guide the project to a successful end—all this was presented with expertise and professionalism that are hard to match. Considering that every bit of distilled truth Juval shared with us in class is acquired, tested, and proven in real life, it transforms this learning experience into a powerful body of knowledge that is an absolute necessity for anyone who aspires to be a Software Architect.”
— Rossen Totev,软件架构师/项目负责人
—Rossen Totev, software architect/project lead
“项目设计大师班是一次改变职业生涯的活动。我来自一个几乎病态地滥用最后期限和预算的环境,有机会向 Juval 学习真是天赐之物。他一点一点地提供了正确设计项目的零件和适当的工具。结果是,在现代软件开发的动态甚至混乱的环境中,成本和时间表得到了控制。Juval 说,你将参与一场针对逾期和超额成本的不对称战争,离开时你会真正感觉到你拿着枪去打一场刀战。没有什么魔法——只是将基本的工程和制造原则应用于软件——但你会回到办公室,感觉自己像个巫师。”
“The Project Design Master Class is a career-changing event. Having come from an environment where deadlines and budgets are almost pathologically abused, having the opportunity to learn from Juval was a godsend. Piece by piece he provided the parts and the appropriate tools for properly designing a project. The result is that costs and timelines are kept in check in the dynamic and even chaotic environment of modern software development. Juval says that you are going to engage in asymmetric warfare against overdue and over cost, and you walk away truly feeling that you have a gun to take to a knife fight. There is no magic—only the application of basic engineering and manufacturing tenets to software—but you will go back to your office feeling like a wizard.”
— West Covina Service Group 软件开发经理 Matt Robold
—Matt Robold, software development manager, West Covina Service Group
“非常棒的体验。改变了我对软件开发的思考方式。我一直知道,在设计和编码方面,我的一些想法是正确的。我以前无法用语言表达,但现在我可以了。它不仅影响了我对软件设计的思考方式,还影响了其他类型的设计。”
“Fantastic experience. Changed my way of thinking on how to approach software development. I always knew some of what I was thinking was right with regard to design and coding. I never could express it in words but now I have them. It not only affects my way of thinking about software design but also other types of design.”
—首席建筑师 Lee Messick
—Lee Messick, lead architect
“多年来,我从事的软件项目一直受到紧迫的最后期限的困扰。试图了解软件开发方法和正确的流程感觉就像是在消耗精力,因为我必须与管理层不愿改变的态度作斗争,还要满足客户的不合理要求。我在两条战线上作战,感到绝望。我觉得自己像个浪人。大师班让我突然明白了,我以前从未知道过这种感觉。它教授了我正在寻找的确切知识。我学到了深刻的技巧,改变了我对软件项目运作方式的理解。现在,我拥有了在永无止境的需求变化中高效、有效地引导我的项目的工具。在一个混乱的世界里,这门课带来了秩序。我永远感谢 IDesign。我的生活将永远不同。”
“The software project I work on was plagued with breakneck deadlines for years. Trying to understand software development methodologies and proper process felt like an energy drain because I had to battle management’s unwillingness to change, on top of meeting the unreasonable demands of my clients. I was fighting a war on two fronts and felt hopeless. I felt like a rōnin. The Master Class provided a rush of clarity I never knew existed. It taught the exact knowledge that I was searching for. I learned profound techniques that transformed my understanding of how software projects operate. I now have the tools to efficiently and effectively navigate my project in a torrent of never-ending requirement changes. In a world of chaos this class brought order. I am forever grateful to IDesign. My life will never be the same.”
— Aaron Friedman,软件架构师
—Aaron Friedman, software architect
“生活发生了改变。我感觉自己就像一架积满灰尘几十年后终于调好音的钢琴。”
“Life changing. I feel like a tuned piano after collecting dust for a couple of decades.”
— Jordan Jan,首席技术官/架构师
—Jordan Jan, CTO/architect
“课程太棒了。这无疑是我职业生涯中最紧张但收获最大的一周。”
“The course was amazing. Easily this was the most intense but rewarding week of my professional life.”
— Stoil Pankov,软件架构师
—Stoil Pankov, software architect
“向 Juval Löwy 学习改变了我的生活。我从一名开发人员变成了一名真正的软件架构师,运用其他学科的工程原理不仅设计软件,还设计我的职业生涯。”
“Learning from Juval Löwy has changed my life. I went from being just a developer to being a true software architect, applying engineering principals from other disciplines to design not just software, but also my career.”
— Kory Torgersen,软件架构师
—Kory Torgersen, software architect
“建筑师大师班是一堂关于技能和设计的人生课程——我参加了两次。第一次参加时,我感觉发生了翻天覆地的变化,真希望自己几十年前刚开始工作时就参加了这门课程。即使是第二次参加,我也只学到了 25%,因为这些想法太深刻了。需要重新布线和忘却旧知识真的很痛苦,但我需要和更多的同事一起回来。最后,每一天,我都会回想 Juval 在课堂上讲的内容,并用它来帮助我的团队实现哪怕是很小的事情,这样我们最终都可以称自己为专业工程师。(PS 我第二次做了 100 页笔记!)”
“The Architect Master Class is a life lesson on skills and design—which I took twice. It was so transformational the first time I attended that I wished I had taken this class decades back, when I started my career. Even taking it for the second time has only gotten me to 25% because the ideas are so profound. The required brain rewiring and unlearning is really painful, but I needed to come back again with more of my colleagues. Finally, every day that goes by I reflect back on what Juval said in the classes and use that to help my teams implementing even the small things so that we can all eventually call ourselves Professional Engineers. (P.S. I took 100 pages of notes second time around!)”
— Jaysu Jeyachandran,尼尔森软件开发经理
—Jaysu Jeyachandran, software development manager, Nielsen
“如果您在看到和经历了我们行业许多失败的尝试后感到沮丧、缺乏精力和动力,那么这门课程将使您重获生机。它将带您迈向下一个专业成熟度水平,并给您希望和信心,让您能够正确应用事物。您将以全新的思维方式和足够的无价工具离开项目设计大师班,这将让您没有理由失败软件项目。您可以练习,亲自动手,获得洞察力和经验。是的,当需要向利益相关者提供项目的成本、时间和风险时,您可以做到准确。现在,不要等待公司派您来上这门课。如果您认真对待自己的职业,您应该赶快参加这门或任何 IDesign 大师班。这是您可以做的最好的自我投资。感谢 IDesign 的整个优秀团队为帮助软件行业成为一门坚实的工程学科而做出的不懈努力。”
“If you are frustrated, lacking energy, and demotivated after seeing and experiencing many failed attempts of our industry, the class is a boost of rejuvenation. It takes you to the next level of professional maturity and also gives you the hope and confidence that you can apply things properly. You will leave the Project Design Master Class with a new mindset and enough priceless tools that will give you no excuse to ever fail a software project. You get to practice, you get your hands dirty, you get insight, and experience. Yes, you CAN be accurate when it is time to provide your stakeholders with the cost, the time, and the risk of a project. Now, just don’t wait for a company to send you to this class. If you are serious about your career, you should hurry to take this or any IDesign Master Classes. It is the best self-investment you can make. Thank the entire great team of IDesign for their continuous efforts in helping the software industry become a solid engineering discipline.”
— Lucian Marian, Mirabel 软件架构师
—Lucian Marian, software architect, Mirabel
“作为一名二十多岁、职业生涯相对早期的人,我可以诚实地说,这门课程改变了我的生活以及我对职业道路的看法。我真诚地希望这将成为我生命中最关键的转折点之一。”
“As someone in their late twenties, relatively early in their career, I can honestly say that this course has changed my life and the way I view my career path. I honestly expect this to be one of the most pivotal points of my life.”
— Alex Karpowich,软件架构师
—Alex Karpowich, software architect
“我想感谢你们让我度过了改变人生的一周。通常我坐在课堂上的时间不会超过 50%——这很无聊,而且他们不会教我任何我无法自学或已经知道的东西。在建筑师大师班上,我每天坐九个小时,却乐此不疲:我了解了作为一名建筑师的责任是什么(我以为建筑师只是软件设计师),软件的工程方面,不仅要按时交付,还要按预算和按质量交付的重要性,不要等到“成长”成为一名建筑师,而是要管理我的职业生涯,以及如何量化和衡量我以前认为是直觉的东西。我从这周有了更多的见解,很多东西现在已经到位。我迫不及待地想参加下一届大师班。”
“I wanted to thank you for a (professional) life-changing week. Usually I can’t sit at class more than 50% of the time—it is boring and they don’t teach me anything I couldn’t teach myself or already know. In the Architect’s Master Class I sat for nine hours a day and couldn’t get enough of it: I learned what my responsibilities are as an architect (I thought the architect is only the software designer), the engineering aspect of software, the importance of delivering not only on time but also on budget and on quality, not to wait to ‘grow’ to be an architect but to manage my career, and how to quantify and measure what I previously considered as hunches. I have much more insight from this week and many pieces are now in place. I can’t wait to attend the next Master Class.”
— Itai Zolberg,软件架构师
—Itai Zolberg, software architect
系统与项目设计方法
A Method for System
and Project Design
波士顿 • 哥伦布 • 纽约 • 旧金山 • 阿姆斯特丹 • 开普敦
迪拜 • 伦敦 • 马德里 • 米兰 • 慕尼黑 • 巴黎 • 蒙特利尔 • 多伦多 • 德里 • 墨西哥城
圣保罗 • 悉尼 • 香港 • 首尔 • 新加坡 • 台北 • 东京
Boston • Columbus • New York • San Francisco • Amsterdam • Cape Town
Dubai • London • Madrid • Milan • Munich • Paris • Montreal • Toronto • Delhi • Mexico City
São Paulo • Sydney • Hong Kong • Seoul • Singapore • Taipei • Tokyo
制造商和销售商用来区分其产品的许多名称均已声明为商标。本书中出现这些名称时,如果出版商知道商标声明,则这些名称将以首字母大写或全部大写的形式印刷。
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals.
作者和出版商在编写本书时已尽心尽力,但不作任何明示或暗示的保证,也不对错误或遗漏承担任何责任。对于因使用本文所含信息或程序而导致的或与之相关的偶然或间接损失,我们不承担任何责任。
The author and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein.
有关批量购买此书的信息或特殊销售机会(可能包括电子版;定制封面设计;以及特定于您的业务、培训目标、营销重点或品牌兴趣的内容),请联系我们公司销售部门,邮箱为corpsales@pearsoned.com或电话为 (800) 382-3419。
For information about buying this title in bulk quantities, or for special sales opportunities (which may include electronic versions; custom cover designs; and content particular to your business, training goals, marketing focus, or branding interests), please contact our corporate sales department at corpsales@pearsoned.com or (800) 382-3419.
如需咨询政府销售事宜,请联系governmentsales@pearsoned.com。
For government sales inquiries, please contact governmentsales@pearsoned.com.
对于美国境外的销售问题,请联系intlcs@pearson.com。
For questions about sales outside the U.S., please contact intlcs@pearson.com.
请访问我们的网站:informit.com/aw
Visit us on the Web: informit.com/aw
国会图书馆控制编号:2019950124
Library of Congress Control Number: 2019950124
版权所有 © 2020 Pearson Education, Inc.
Copyright © 2020 Pearson Education, Inc.
封面图片:Nattapol_Sritongcom/Shutterstock
Cover image: Nattapol_Sritongcom/Shutterstock
保留所有权利。本出版物受版权保护,任何禁止复制、存储在检索系统中或以任何形式或任何手段(电子、机械、影印、录制或类似方式)传输之前,必须获得出版商的许可。有关许可、申请表和 Pearson Education 全球权利与许可部门内相关联系人的信息,请访问www.pearson.com/permissions/。
All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding permissions, request forms and the appropriate contacts within the Pearson Education Global Rights & Permissions Department, please visit www.pearson.com/permissions/.
ISBN-13: 978-0-13-652403-8
ISBN-10: 0-13-652403-6
ISBN-13: 978-0-13-652403-8
ISBN-10: 0-13-652403-6
ScoutAutomatedPrintCode
ScoutAutomatedPrintCode
致我的父亲,托马斯·查尔斯(汤米)·洛维
To My Father, Thomas Charles (Tommy) Löwy
Eliminating Analysis-Paralysis
Avoid Functional Decomposition
Problems with Functional Decomposition
Reflecting on Functional Decomposition
Example: Functional Trading System
Volatility-Based Decomposition
Decomposition, Maintenance, and Development
Volatility-Based Decomposition and Testing
Solutions Masquerading as Requirements
Example: Volatility-Based Trading System
Semi-Closed/Semi-Open Architecture
Chapter 5 System Design Example
Add Tradesman/Contractor Use Case
Project Design and Project Sanity
Chapter 7 Project Design Overview
Software Development Plan Review
Accelerating Software Projects
Total, Direct, and Indirect Costs
Criticality versus Activity Risk
Chapter 11 Project Design in Action
Unlimited Resources (Iteration 1)
Infrastructure First (Iteration 2)
Going Subcritical (Iteration 7)
Compression Using Better Resources
Rebuilding the Time–Cost Curve
Chapter 12 Advanced Techniques
Finding the Decompression Target
Designing a Network of Networks
Chapter 13 Project Design Example
Individual Activity Estimations
Dependencies and Project Network
Duration, Planned Progress, and Risk
Chapter 14 Concluding Thoughts
Architecture versus Estimations
Senior Developers as Junior Architects
Activity Life Cycle and Status
Projections and Corrective Actions
Appendix B Service Contract Design
From Service Design to Contract Design
很少有人是因为被迫进入软件开发领域的。许多人是真的爱上了编程,并决定以此谋生。然而,大多数人所希望的职业与软件开发的黑暗、令人沮丧的现实之间存在着巨大的差距。整个软件行业正处于严重的危机之中。危机之所以如此严重,是因为它是多方面的;软件开发的每个方面都受到了破坏:
Hardly anyone gets into software development because they were forced into it. Many literally fall in love with programming and decide to pursue it for a living. And yet there is a vast gap between what most hoped their career would be like and the dark, depressing reality of software development. The software industry as a whole is in a deep crisis. What makes the crisis so acute is that it is multidimensional; every aspect of software development is broken:
成本。项目预算与开发系统的实际成本之间相关性很弱。许多组织甚至不尝试解决成本问题,也许是因为他们根本不知道该怎么做,或者是因为这样做会迫使他们认识到他们买不起系统。即使新系统的第一个版本的成本是合理的,由于设计不佳和无法适应变化,系统在整个生命周期内的成本往往远高于应有的成本。随着时间的推移,维护成本变得如此高昂,以至于公司经常决定将原有系统全部清除,但不久之后,系统就会变得一团糟,其成本与新系统一样高甚至更高。没有其他行业会定期选择将原有系统全部清除,因为这样做没有经济意义。航空公司维护大型喷气式飞机要几十年,而房屋则可能有百年历史。
Cost. There is weak correlation between the budget set for a project and what it will actually cost to develop the system. Many organizations do not even try to address the cost issue, perhaps because they simply do not know how, or perhaps because doing so will force them to recognize they cannot afford the system. Even if the cost of the first version of a new system is justified, often the cost across the life of the system is much higher than what it should have been due to poor design and an inability to accommodate changes. Over time, maintenance costs become so prohibitive that companies routinely decide to wipe the slate clean, only to end up shortly thereafter with an equally or even more expensive mess as a new system. No other industry opts for a clean slate on a regular basis simply because doing so does not make economic sense. Airlines maintain jumbo jets for decades, and a house can be a century old.
时间表。截止日期通常只是任意的、无法执行的概念,因为它们与实际开发系统所需的时间几乎没有关系。对于大多数开发人员来说,截止日期是他们努力向前时呼啸而过的无用之物。如果开发团队确实赶上了截止日期,那么每个人都会会感到吃惊,因为人们总是期望它们会失败。这也是系统设计不佳的直接结果,这种设计会导致变更和新工作在整个系统中蔓延,并使以前完成的工作失效。此外,这是非常低效的开发过程的结果,这种开发过程忽略了活动之间的依赖关系以及构建系统的最快、最安全的方式。不仅整个系统的上市时间非常长,而且单个功能的时间也可能同样被夸大。项目延误已经够糟糕了;如果延误对管理层和客户隐瞒,情况就更糟了,因为没有人知道项目的真实状态。
Schedule. Deadlines are often just arbitrary and unenforceable constructs because they have little to do with the time it takes to actually develop the system. For most developers, deadlines are these useless things whooshing by as they plow ahead. If the development team does meet the deadline, everyone is surprised because the expectation is always for them to fail. This, too, is a direct result of a poor system design that causes changes and new work to ripple through the system and invalidate previously completed work. Moreover, it is the result of a very inefficient development process that ignores both the dependencies between activities and the fastest, safest way of building the system. Not only is the time to market for the whole system exceedingly long, but the time for a single feature may be just as inflated. It is bad enough when the project slips its schedule; it is even worse when the slip was hidden from management and customers since no one had any idea what the true status of the project was.
需求。开发人员经常最终解决错误的问题。最终客户或其内部中介(如营销人员)与开发团队之间存在长期的沟通障碍。大多数开发人员也无法适应他们未能捕捉到需求的情况。即使需求得到了完美的传达,它们也可能会随着时间的推移而发生变化。这种变化会使设计无效,并破坏团队试图构建的一切。
Requirements. Developers often end up solving the wrong problems. There is a perpetual communication failure between the end customers or their internal intermediaries (such as marketing) and the development team. Most developers also fail to accommodate their failure to capture the requirements. Even when requirements are perfectly communicated, they will likely change over time. This change invalidates the design and unravels everything the team tried to build.
人员配备。即使是最基本的软件系统也非常复杂,超出了人脑的理解能力。内部和外部的复杂性是糟糕的系统架构的直接结果,这反过来又导致系统错综复杂,很难维护、扩展或重用。
Staffing. Even modest software systems are so complex that they have exceeded the capacity of the human brain to make sense of them. The internal and external complexity is a direct result of poor system architecture, which in turn leads to convoluted systems that are very difficult to maintain, extend, or reuse.
维护。大多数软件系统都不是开发它们的人员维护的。新员工不了解系统的运行方式,因此他们在解决旧问题时不断引入新问题。这很快增加了维护成本和上市时间,并导致重新开始或取消项目。
Maintenance. Most software systems are not maintained by the same people who developed them. The new staff does not understand how the system operates, and as a result they constantly introduce new problems as they try to solve old ones. This quickly drives up the cost of maintenance and the time to market, and leads to clean-slate efforts or canceled projects.
质量。也许没有什么比质量更能破坏软件系统了。软件有缺陷,而“软件”这个词本身就是“缺陷”的同义词。开发人员无法想象没有缺陷的软件系统。修复缺陷通常会增加缺陷数量,添加功能或进行简单的维护也是如此。质量差是系统架构无法测试、理解或维护的直接结果。同样重要的是,大多数项目没有考虑到必要的质量控制活动,也没有为每项活动分配足够的时间以完美地完成。
Quality. Perhaps nothing is as broken with software systems as quality. Software has bugs, and the word “software” is itself synonymous with “bugs.” Developers cannot conceive of defect-free software systems. Fixing defects often increases the defect count, as does adding features, or just plain maintenance. Poor quality is a direct result of a system architecture that does not lend itself to being testable, understandable, or maintainable. Just as important, most projects do not account for essential quality-control activities and fail to allocate enough time for every activity to be completed in an impeccable manner.
几十年前,软件行业开始开发软件来解决世界问题。如今,软件开发本身就是一个世界级问题。软件开发的问题经常以非技术性的方式表现出来,例如高压力的工作环境、高离职率、倦怠、缺乏信任、低自尊,甚至身体疾病。
Decades ago, the industry started developing software to solve the world’s problems. Today, software development itself is a world-class problem. The problems of software development frequently manifest themselves in nontechnical ways such as a high-stress working environment, high turnover rate, burnout, lack of trust, low self-esteem, and even physical illness.
软件开发中的问题并不新鲜。1事实上,有些人的整个职业生涯都在从事软件开发,却从未见过一次软件开发成功。这让他们相信这根本就不可能,他们对解决这些问题的任何尝试都嗤之以鼻,因为“事情就是这样”。他们甚至可能与那些试图改进软件开发的人作对。他们已经得出结论,这个目标是不可能的,所以任何试图获得更好结果的人都在试图做不可能的事情,这是对他们智力的侮辱。
None of the problems in software development is new.1 Indeed, some people have spent their entire careers in software development without seeing software done right even once. This leads them to believe that it simply cannot be done, and they are dismissive of any attempt to address these issues because “that’s just the way things are.” They may even fight those who are trying to improve software development. They have already concluded that this goal is impossible, so anyone who is trying to get better results is trying to do the impossible, which insults their intellect.
1. Edsger W. Dijkstra,《谦逊的程序员:ACM 图灵讲座》,《ACM 通讯》 15,第 10 期(1972 年 10 月):859-866 页。
1. Edsger W. Dijkstra, “The Humble Programmer: ACM Turing Lecture,” Communications of the ACM 15, no. 10 (October 1972): 859–866.
我自己的业绩记录就是一个反例,证明了成功开发软件系统是可能的。我负责的每个项目都按时、按预算、零缺陷完成。在创立 IDesign 后,我继续保持着这一记录,我们一次又一次地帮助客户兑现承诺。
My own track record is a counterexample demonstrating that it is possible to successfully develop software systems. Every project for which I was responsible shipped on schedule, on budget, and with zero defects. I continued this record after founding IDesign, where we have helped customers again and again deliver on their commitments.
这种持续、可重复的成功记录并非偶然。我接受的培训和教育都是系统工程,既包括物理系统,也包括软件系统,因此很容易看出这两个领域的相似之处。将实用原则应用于软件设计,其他工程领域的常识性想法在软件系统中也同样适用。我从未想过不把软件开发视为工程,或者在没有设计或计划的情况下开发系统。我认为没有必要在自己的信念上妥协,也没有必要屈服于权宜之计,因为做正确的事情就是有效的,而不这样做的可怕后果是显而易见的。我很幸运有优秀的导师,在正确的时间出现在正确的地点,看到什么有效,什么无效,有机会在早期参与大型关键工作,并成为卓越文化的一部分。
This consistent, repeatable track record of success was no accident. My training and schooling were in systems engineering, of both physical systems and software systems, and it was easy to recognize the similarities across the two worlds. Applying practical principles to software design, ideas that are common-sense in other engineering fields made sense in software systems, too. It never occurred to me not to treat software development as engineering or to develop a system without a design or without a plan. I saw no need to compromise on my conviction, or to give in to expediencies because doing the right things just worked, and the appalling consequences of not doing so were plain to see. I was fortunate to have great mentors, to be at the right place at the right time to see what worked and what did not, to have the opportunity to participate early on in large critical efforts, and to be part of cultures of excellence.
近年来,我注意到这个行业的问题越来越严重。越来越多的软件项目失败。这些失败在时间和金钱上都花费了越来越多的代价,甚至已经完成的项目也往往偏离了最初的承诺。危机的恶化不仅仅是因为系统越来越大,也不仅仅是因为云、紧迫的最后期限或更高的变化率。我怀疑真正的原因是,如何设计和开发软件系统的知识正在逐渐从开发队伍中消失。曾经,大多数团队都有一位资深人士来指导年轻人并传承部落知识。如今,这些导师已经离开或退休。没有他们,普通员工只能获得无限的信息,但知识为零。
In recent years, I have noticed that the industry’s problems are getting worse. More and more software projects fail. These failures are getting more expensive in both time and money, and even completed projects tend to stray further afield from their original commitments. The crisis is worsening not just because the systems are getting bigger or because of the cloud, aggressive deadlines, or higher rate of change. I suspect the real reason is that the knowledge of how to design and develop software systems is slowly fading from within the development ranks. Once, most teams had a veteran who mentored the young and handed down the tribal knowledge. Nowadays these mentors have moved on or are retiring. In their absence, the rank and file is left with access to infinite information but zero knowledge.
我希望你能用一种方式解决软件危机,比如使用一个流程、一种开发方法、一种工具或一种技术。不幸的是,要解决一个多维问题,你需要一个多维解决方案。在这本书中,我提供了一个统一的补救措施:纠正软件。
I wish there was just one thing you could do to fix the software crisis such as using a process, a development methodology, a tool, or a technology. Unfortunately, to fix a multidimensional problem, you need a multidimensional solution. In this book I offer a unified remedy: righting software.
从抽象的角度讲,我所建议的只是使用工程原理来设计和开发软件系统。好消息是,我们无需重新发明轮子。其他工程学科已经取得了相当大的成功,因此软件行业可以借鉴它们的关键通用设计思想,并将其应用于软件。您将在本书中看到软件工程的一套基本原理,以及一套适用于软件系统和项目的全面工具和技术。要想取得成功,您必须采用工程视角。确保软件系统在时间和风险方面可维护、可扩展、可重用、价格合理且可行,这些都是工程方面,而不是技术方面。这些工程方面直接追溯到系统和项目的设计。由于软件工程师一词主要指软件开发人员,因此出现了软件架构师一词来描述团队中拥有项目所有设计方面的人。因此,我将读者称为软件架构师。
In the abstract, all I suggest is to design and develop software systems using engineering principles. The good news is that there is no need to reinvent the wheel. Other engineering disciplines are quite successful, so the software industry can borrow their key universal design ideas and adapt them to software. You will see in this book a set of first principles in software engineering, as well as a comprehensive set of tools and techniques that apply to software systems and projects. To succeed, you have to assume an engineering perspective. Ensuring that the software system is maintainable, extensible, reusable, affordable, and feasible in terms of time and risk are all engineering aspects, not technical aspects. These engineering aspects are traced directly to the design of the system and the project. Since the term software engineer largely refers to a software developer, the term software architect has emerged to describe the person in the team who owns all the design aspects of the project. Accordingly, I refer to the reader as a software architect.
本书中的想法并不是您需要正确理解的唯一内容,但它们肯定是一个好的开始,因为它们解决了前面提到的问题的根本原因。根本原因是设计不佳,无论是软件系统本身的设计还是用于构建该系统的项目的设计。您将看到,按时按预算交付软件,并设计满足所有可能要求的系统是完全有可能的。结果也是系统易于维护、扩展和重用。我希望通过实践这些想法,您不仅可以纠正您的系统,还可以纠正您的职业生涯,并重新点燃您对软件开发的热情。
The ideas in this book are not the only things you will need to get right, but they certainly are a good start because they treat the root cause of the problems mentioned earlier. That root cause is poor design, be it of the software system itself or of the project used to build that system. You will see that it is quite possible to deliver software on schedule and on budget and to design systems that meet every conceivable requirement. The results are also systems that are easy to maintain, extend, and reuse. I hope that by practicing these ideas you will right not just your system but your career and rekindle your passion for software development.
本书展示了一种结构化的工程方法,用于系统和项目设计。该方法分为两部分,反映在本书的结构中:系统设计(通常称为架构)和项目设计。这两个部分相辅相成,是成功的必要条件。附录为主要讨论提供了一些补充内容。
The book demonstrates a structured engineering approach to system and project design. The methodology has two parts, reflected by the structure of this book: system design (commonly known as architecture) and project design. Both parts complement each other and are required for success. The appendices provide some supplemental content to the main discussion.
在大多数技术书籍中,每一章都涉及一个主题并进行深入讨论。这使得这本书更容易写,但这通常不是人们学习的方式。相比之下,在这本书中,教学类似于螺旋式的。在这本书的两个部分中,每一章都重申了前几章中的想法,更深入地或者利用跨多个方面的额外见解来发展想法。这模仿了自然的学习过程。每章都依赖于之前的章节,因此您应该按顺序阅读章节。本书的两个部分都包含一个详细的案例研究,以展示这些想法以及其他方面。同时,为了保持迭代简洁,作为一般规则,我通常会避免重复自己,因此即使是关键点也只讨论一次。
In most technical books, each chapter addresses a single topic and discusses it in depth. This makes the book easier to write, but that is typically not how people learn. In contrast, in this book, the teaching is analogous to a spiral. In both parts of the book, each chapter reiterates ideas from the previous chapters, going deeper or developing ideas using additional insight across multiple aspects. This mimics the natural learning process. Each chapter relies on those that preceded it, so you should read the chapters in order. Both parts of the book include a detailed case study that demonstrates the ideas as well as additional aspects. At the same time, to keep the iterations concise, as a general rule I usually avoid repeating myself, so even key points are discussed once.
以下是各章节和附录的简要摘要:
Here is a brief summary of the chapters and appendices:
第 1 章介绍了这一关键思想:要想成功,你必须同时设计系统和构建系统的项目。这两种设计对于最终的成功都至关重要。没有架构就无法设计项目,设计一个无法构建的系统是没有意义的。
Chapter 1 introduces this key idea: To succeed, you must design both the system and the project to build it. Both designs are essential for eventual success. You cannot design the project without the architecture, and it is pointless to design a system that you cannot build.
第 2 章致力于将系统分解为构成其架构的组件。大多数人以最糟糕的方式分解系统,因此本章首先解释不要做什么。一旦建立起这一点,您将了解如何正确地分解系统,并学习一组有助于该过程的简单分析工具和观察结果。
Chapter 2 is dedicated to decomposing the system into the components that make up its architecture. Most people decompose systems in the worst possible way, so the chapter starts with explaining what not to do. Once that is established, you will see how to correctly decompose the system, and learn a set of simple analysis tools and observations that help in that process.
第 3 章通过介绍结构改进了第 2 章的思想。您将了解如何捕获需求、如何分层架构、架构组件的分类、组件之间的相互关系、具体的分类准则以及子系统设计等一些相关问题。
Chapter 3 improves on the ideas of Chapter 2 by introducing structure. You will see how to capture requirements, how to layer your architecture, the taxonomy of the components of the architecture, their interrelationships, specific classification guidelines, and some related issues such as subsystems design.
第 4 章介绍如何将系统组件组合成满足需求的有效组合。这一章很短,包含了本书的几个关键设计原则,并将前两章的内容运用到每个系统中,形成一个强大的思维工具。
Chapter 4 shows how to assemble the system components into a valid composition that addresses the requirements. This short chapter contains several of the book’s key design principles, and it leverages the previous two chapters into a powerful mental tool you will use in every system.
第 5 章是一个广泛的案例研究,展示了迄今为止讨论的系统设计思想。系统设计螺旋的最后一次迭代展示了一个实际的系统,将系统设计与业务相结合,并展示了如何生成架构并对其进行验证。
Chapter 5 is an extensive case study that demonstrates the system design ideas discussed so far. This final iteration of the system design spiral presents an actual system, aligns the system design with the business, and shows how to produce the architecture and validate it.
由于大多数人从未听说过(更不用说实践过)项目设计,本章介绍了这一概念并提供了参与项目设计的动机。这是项目设计螺旋的零迭代。
Since most people have never even heard of—let alone practiced—project design, this chapter introduces the concept and provides the motivation for engaging in project design. This is iteration zero of the project design spiral.
第 7 章概述了如何设计项目。它首先定义软件开发的成功,然后介绍明智的决策、项目人员配置、项目网络、关键路径、进度安排和成本等关键概念。本章涵盖了后续章节中使用的大部分思想和技术,最后对角色和职责进行了重要的讨论。
Chapter 7 provides a broad overview of how to design a project. It starts by defining success in software development, and then presents the key concepts of educated decisions, project staffing, project network, critical path, scheduling, and cost. The chapter covers most of the ideas and techniques used in subsequent chapters, and it ends with an important discussion of roles and responsibilities.
第 8 章深入探讨了项目网络及其作为设计工具的使用。您将了解如何将项目建模为网络图,学习浮动时间的关键概念,理解如何在人员配置和调度中使用浮动时间,并认识到浮动时间与风险的关系。
Chapter 8 dives into the project network and its use as a design tool. You will see how to model the project as a network diagram, learn the key concept of float, understand how to use floats in staffing and scheduling, and recognize how floats relate to risk.
第 9 章定义了任何项目中时间和成本之间可能存在的权衡,并规定了通过清洁和正确工作来加速任何项目的方法。除此之外,您还将学习压缩、时间成本曲线和成本要素等关键概念。
Chapter 9 defines the possible tradeoffs between time and cost in any project and prescribes ways to accelerate any project by working cleaning and correctly. Beyond that, you will learn the key concepts of compression, the time–cost curve, and the elements of cost.
第 10 章介绍了大多数项目中缺少的元素:量化风险。您将了解如何衡量风险并将其映射到上一章中的时间和成本概念,以及如何根据网络计算风险。风险通常是评估选项的最佳方式,也是一流的规划工具。
Chapter 10 presents the missing element in most projects: quantified risk. You will see how to measure and map risk to the time and cost concepts from the previous chapter, and how to calculate risk based on the network. Risk is often the best way of evaluating options and is a first-class planning tool.
第 11 章通过系统地介绍设计项目所涉及的步骤,将前几章的所有概念付诸实践。虽然它具有示例的特质,但其目标是展示设计项目时使用的思维过程,以及如何准备供业务决策者审查。
Chapter 11 puts all the concepts of the previous chapters into use via a systematic walkthrough of the steps involved in designing a project. While it has the makings of an example, the objective is to demonstrate the thought process used when designing a project, as well as how to prepare for review by business decision makers.
本章遵循螺旋式学习模型,提供高级技术和概念。这些技术适用于各种复杂程度的项目,从简单到最具挑战性。这些高级技术与前几章相互补充,您经常会将它们组合使用。
Following the spiral model of learning, this chapter offers advanced techniques and concepts. These techniques are useful in projects with all levels of complexity, from the simple to the most challenging. These advanced techniques complement the previous chapters and each other, and you will often use them in combination.
第 13 章是与第 5 章的系统设计示例相对应的项目设计示例。它也是一个案例研究,展示了设计项目的端到端流程。本章的重点是案例研究,而不是技术。
Chapter 13 is the project design example corresponding to the system design example of Chapter 5. It, too, is a case study demonstrating the end-to-end process of designing a project. The focus in this chapter is on the case study and less about the techniques.
最后一章从设计的技术层面回顾过去,提供一系列指南、技巧、观点和开发流程理念。它首先回答了何时设计项目这一重要问题,最后讨论了项目设计对质量的影响。
This final chapter takes a step back from the technical aspects of design and offers a collection of guidelines, tips, perspectives, and development process ideas. It starts by answering the important question of when to design a project, and it ends with the effect project design has on quality.
附录 A向您展示了如何跟踪项目相对于计划的进度以及如何在必要时采取纠正措施。项目跟踪更多的是关于项目管理而不是项目设计,但它对于确保您在工作开始后履行承诺至关重要。
Appendix A shows you how to track the project’s progress with regard to the plan and how to take corrective actions when needed. Project tracking is more about project management than it is about project design, but it is crucial in assuring you meet your commitments once the work starts.
架构本身很广泛且粗糙,您必须设计其每个组件的细节。这些细节中最重要的是服务契约。附录 B向您指出了设计服务契约的正确方法。此外,对模块化、大小和成本的讨论与本书的大多数章节非常吻合。
The architecture itself is broad and coarse, and you have to design the details of each of its components. The most important of these details is the service contract. Appendix B points you toward the correct way of designing service contracts. In addition, the discussion of modularity, size, and cost resonates very well with most chapters in this book.
附录 C是本书中提到的关键指令、指南和注意事项的综合列表。该标准简洁明了,只讲“是什么”,而不是“为什么”。标准背后的原理可在本书的其余部分找到。
Appendix C is a consolidated list of the key directives, guidelines, and dos and don’ts mentioned throughout this book. The standard is terse and is all about the “what,” not the “why.” The rationale behind the standard is found in the rest of the book.
虽然本书的目标读者是软件架构师,但它的读者群要广泛得多。我假设您,即读者,是一名架构师或高级软件专业人员、项目经理或身兼数职的人。话虽如此,有志于提高技能的开发人员将从本书中受益匪浅。无论您目前的职位如何,本书都将为您打开职业生涯的大门。当您第一次拿起这本书时,您可能还不是一位成就卓著的架构师,但一旦您阅读并掌握了方法论,您就会成为世界顶尖人才之一。
While this book targets software architects, it has a much broader audience. I assume that you, the reader, are an architect or senior software professional, a project manager, or someone who wears multiple hats. That said, aspiring developers wanting to grow their skill set will benefit greatly from the book. Regardless of your current position, this book will open doors for you through the rest of your career. You may not be an accomplished architect when you first pick up this book, but you will be among the top in the world once you have read it and have mastered the methodology.
本书的技术和理念适用于任何编程语言(如 C++、Java、C# 和 Python)、平台(如 Windows、Linux、移动、本地和云)和项目规模(从最小到最大的项目)。它们还涵盖所有行业(从医疗保健到国防)、所有商业模式和公司规模(从初创公司到大型企业)。
The techniques and ideas for the book apply regardless of programming language (such as C++, Java, C#, and Python), platform (such as Windows, Linux, mobile, on-premise, and cloud), and project size (from the smallest to the largest projects). They also cross all industries (from healthcare to defense), all business models, and company sizes (from the startup to the large corporation).
我对读者最重要的假设是,你非常关心自己所做的事情,当前的失败和浪费让你苦恼。你想做得更好,但缺乏指导或被糟糕的做法所困扰。
The most important assumption I have made about the reader is that you care about what you do, at a deep level, and the current failures and waste distresses you. You want to do better but lack guidance or are confused by bad practices.
阅读本书的唯一前提是要有开放的心态。过去的失败和挫折是加分项。
The only prerequisite for this book is an open mind. Past failures and frustration are a plus.
本书采用以下印刷约定:
The book uses the following typographic conventions:
大胆的
Bold
用于定义术语和概念。
Used for defining terms and concepts.
指示
Directive
用于第一原则、设计规则或关键指导和建议。
Used for first principles, design rules, or key guidance and advice.
保留字
Reserved Words
用于引用方法论的保留字。
Used for when referring to reserved words of the methodology.
本书的网页提供了示例文件、附录和勘误表。您可以通过以下地址访问此页面:
The web page for this book provides sample files, addenda, and errata. You can access this page at the following address:
http://www.rightingsoftware.org
http://www.rightingsoftware.org
您可以在本书的“下载支持文件”链接下找到示例文件和相关支持材料。
You will find the example files and related supporting material in this book under the “Download Support Files” link.
有关本书的更多信息,请访问
For additional information about this book, go to
informit.com/title/9780136524038。
informit.com/title/9780136524038.
您也可以通过以下地址联系作者:
You can also contact the author at this address:
在 InformIT 网站上注册您的Righting Software 副本,以便在更新和/或更正可用时方便地访问它们。要开始注册过程,请转到informit.com/register并登录或创建帐户。输入产品 ISBN (9780136524038) 并单击提交。在“已注册产品”选项卡上查找此产品旁边的“访问奖励内容”链接,然后单击该链接访问任何可用的奖励材料。如果您希望收到有关新版本和更新的独家优惠通知,请选中复选框以接收我们的电子邮件。
Register your copy of Righting Software on the InformIT site for convenient access to updates and/or corrections as they become available. To start the registration process, go to informit.com/register and log in or create an account. Enter the product ISBN (9780136524038) and click Submit. Look on the Registered Products tab for an Access Bonus Content link next to this product, and follow that link to access any available bonus materials. If you would like to be notified of exclusive offers on new editions and updates, please check the box to receive email from us.
首先,我要感谢两位以各自独特的方式敦促我写这本书的人:加德·梅尔 (Gad Meir) 和贾科·肯帕宁 (Jarkko Kemppainen)。
Let me start by thanking the two who urged me to write the book, each in their own unique way: Gad Meir and Jarkko Kemppainen.
感谢开发编辑兼顾问 Dave Killian:如果再进行编辑,我就得把你列为合著者了。接下来,感谢 Beth Siron 审阅原始手稿。以下人员花时间审阅了草稿:Chad Michel、Doug Durham、George Stevens、Josh Loyd、Riccardo Bennett-Lovsey 和 Steve Land。
Thanks go to the development editor and sounding board, Dave Killian: Any more editing and I would have to list you as a co-author. Next, thanks to Beth Siron for reviewing the raw manuscript. The following people contributed their time by reviewing the draft: Chad Michel, Doug Durham, George Stevens, Josh Loyd, Riccardo Bennett-Lovsey, and Steve Land.
最后,我要感谢我的妻子达娜(Dana),她一直激励我写作,让我有时间远离家人;还要感谢我的父母,他们把我对工程的热爱传递给我。
Finally, I am grateful to my wife, Dana, who keeps inspiring me to write and makes it possible for me to take the time away from the family; and to my parents, who imparted to me the love for engineering.
对于初级建筑师来说,有很多选择。
For the beginner architect, there are many options.
对于建筑大师来说,这样的机会寥寥无几。
For the master architect, there are but a few.
《建筑师之禅1》简单地指出,对于建筑师初学者来说,做任何事情都有很多选择。然而,对于建筑大师来说,好的选择只有几个,而且通常只有一个。
The Zen of Architects1 simply states that for the beginner architect, there are many options of doing pretty much anything. For the master architect, however, there are only a few good options, and typically only one.
1. https://en.wikipedia.org/wiki/Zen_Mind,_Beginner's_Mind
1. https://en.wikipedia.org/wiki/Zen_Mind,_Beginner’s_Mind
新手架构师经常被设计软件系统的过多模式、想法、方法和可能性弄得不知所措。软件行业充满了各种想法,人们渴望学习和提高自己,包括正在阅读本书的你。然而,由于完成任何给定设计任务的正确方法太少,你最好只关注那些方法,而忽略那些干扰。软件架构大师知道该怎么做;就像受到超自然的启发一样,他们立即集中精力,得出正确的设计解决方案。
Beginner architects are often perplexed by the plethora of patterns, ideas, methodologies, and possibilities for designing their software system. The software industry is bursting at the seams with ideas and people eager to learn and improve themselves, including you who are reading this book. However, since there are so few correct ways of doing any given design task, you might as well focus only on those and ignore the noise. Master software architects know to do just that; as if by supernatural inspiration, they immediately zoom in and yield the correct design solution.
架构师之禅不仅适用于系统设计,也适用于构建该系统的项目。是的,构建项目和将工作分配给团队成员的方法数不胜数,但它们是否都同样安全、快速、廉价、有用、有效和高效?总架构师还设计构建系统的项目,甚至帮助管理层决定他们是否能负担得起该项目。
The Zen of Architects applies not just to the system design but also to the project that builds it. Yes, there are countless ways of structuring the project and assigning work to the team members, but are they all equally safe, fast, cheap, useful, effective, and efficient? The master architect also designs the project to build the system and even helps management decide if they can afford the project in the first place.
真正掌握任何学科都需要一个过程。除了极少数例外,没有人是天生的专家。我自己的职业生涯就是一个很好的例子。近 30 年前,我从一名初级架构师开始,当时架构师这个词在软件组织中还没有被广泛使用。我先是担任项目架构师,然后担任部门架构师,到 20 世纪 90 年代末,我成为了硅谷一家财富 100 强公司的首席软件架构师。2000 年,我创立了 IDesign,这是一家专门从事软件设计的公司。在 IDesign,我们设计了数百个系统和项目。虽然每个项目都有自己特定的架构和项目计划,但我发现,无论是客户、项目、系统、技术还是开发人员,我的设计建议在抽象上都是一样的。
True mastery of any subject is a journey. With very few exceptions, no one is a born expert. My own career is a case in point. I started as a junior architect almost 30 years ago when the term architect was not commonly used within software organizations. Moving on first as a project architect then as a division architect, by the late 1990s I was the chief software architect of a Fortune 100 company in Silicon Valley. In 2000, I founded IDesign as a company solely dedicated to software design. At IDesign, we have since designed hundreds of systems and projects. While each engagement had its own specific architecture and project plan, I observed that no matter the customer, the project, the system, the technology, or the developers, my design recommendations were, in the abstract, the same.
因此,我问了自己一个简单的问题:您是否真的必须成为拥有数十年系统设计经验和数十个项目经验的软件架构大师才能知道该怎么做?或者您是否可以以某种方式构建它,以便任何清楚了解底层方法的人都可以制作出像样的系统和项目设计?
I therefore asked myself a simple question: Do you really have to be a master software architect with decades of experience designing systems and dozens of projects under your belt to know the right thing to do? Or can you structure it somehow so that anyone with a clear understanding of the underlying methodology can produce a decent system and project design?
第二个问题的答案是肯定的。我把这个结果称为“方法”,也是本书的主题。我已经将“方法”应用于大量项目,并在世界各地教授和指导过几千名建筑师,我可以证明,只要应用得当,它就会非常有效。我并不是在贬低良好态度、技术技能和分析能力的价值。无论使用哪种方法,这些都是成功的必要因素。遗憾的是,只有这些因素还不够;我经常看到项目尽管拥有这些优秀品质和属性的人员,但仍然会失败。然而,当与“方法”结合起来时,这些因素就会给你一个奋起反抗的机会。通过将你的设计建立在合理的工程原理之上,你将学会避开普遍存在的错误做法和错误直觉。
The answer to the second question is a resounding affirmative. I call the result The Method, and it is the subject of this book. Having applied The Method across a multitude of projects, having taught and mentored a few thousand architects the world over, I can attest that, when applied properly, it works. I am not discounting here the value of having a good attitude, technical skills, and analytical capabilities. These are necessary ingredients for success regardless of the methodology you use. Sadly, these ingredients are insufficient; I often see projects fail despite having people with all these great qualities and attributes. However, when combined with The Method, these ingredients give you a fighting chance. By grounding your design on sound engineering principles, you will learn to steer clear of the misguided practices and false intuition that are the prevailing wisdom.
该方法是一种简单有效的分析和设计技术。你可以用一个公式来表达该方法:
The Method is a simple and effective analysis and design technique. You can express The Method as a formula:
方法=系统设计+项目设计
The Method = System Design + Project Design
通过系统设计,该方法提出了将大型系统分解为小型模块化组件的方法。该方法为组件的结构、角色和语义以及这些组件应如何交互提供了指导方针。结果就是系统的架构。
With system design, The Method lays out a way of breaking down a big system into small modular components. The Method offers guidelines for the structure, role, and semantics of the components and how these components should interact. The result is the architecture of the system.
通过项目设计,该方法可帮助您为管理层提供多种构建系统的选项。每个选项都是进度、成本和风险的某种组合。每个选项还可作为系统组装说明,并为项目的执行和跟踪做好准备。
With project design, The Method helps you provide management with several options for building the system. Each option is some combination of schedule, cost, and risk. Each option also serves as the system assembly instructions, and it sets up the project for execution and tracking.
项目设计是本书的第二部分,对于成功而言,它比系统设计重要得多。如果项目有足够的时间和资源,并且风险可以接受,即使是平庸的系统设计也能成功。但是,如果项目没有足够的时间或资源来构建系统,或者项目风险太大,即使是世界一流的系统设计也会失败。项目设计也比系统设计更复杂,因此需要额外的工具、想法和技术。
Project design is the second part of the book and is far more important for success than system design. Even a mediocre system design can succeed if the project has adequate time and resources and if the risk is acceptable. However, even a world-class system design will fail if the project has inadequate time or resources to build the system or if the project is too risky. Project design is also more intricate than system design and, as such, requires additional tools, ideas, and techniques.
由于该方法结合了系统设计和项目设计,因此它实际上是一个设计过程。多年来,软件行业非常重视开发过程,但对设计过程关注甚少。本书旨在填补这一空白。
Because it combines system and project design, The Method is actually a design process. Over the years, the software industry has given great attention to the development process but has devoted little attention to the design process. This book aims at filling this gap.
设计验证至关重要,因为组织不应冒险让团队开始开发不充分的架构或开发组织无法承担的系统。该方法支持并实现这一关键任务,使架构师能够有合理的信心断言所提出的设计是足够的;也就是说,设计满足两个关键目标。首先,设计必须满足客户要求。其次,设计必须解决组织或团队的能力和约束。
Design validation is critical because an organization should not risk having a team start developing against an inadequate architecture or developing a system the organization cannot afford to build. The Method supports and enables this critical task, allowing the architect to assert with reasonable confidence that the proposed design is adequate; that is, the design fulfills two key objectives. First, the design must address the customer requirements. Second, the design must address the organization or the team capabilities and constraints.
一旦开始编码,由于成本和进度问题,更改架构通常是不可接受的。实际上,这意味着,如果没有系统设计验证,就有可能锁定一个不完美的架构,最坏的情况是,锁定一个怪物。组织将不得不在接下来的几年和几个版本中尝试使用最终的系统,直到下一次大规模重写。设计不良的软件系统可能会严重损害企业,使其失去响应商业机会的能力,甚至可能因软件维护成本不断上升而导致财务崩溃。
Once the coding starts, changing the architecture is often unacceptable due to the cost and schedule implications. In practice, this means that without system design validation, there is risk of locking in, at best, an imperfect architecture and, at worst, a monstrosity. The organization will have to try to live with the resulting system for the next few years and several versions until the next big rewrite. A poorly designed software system may seriously damage the business, depriving it of the ability to respond to business opportunities, and may even financially ruin it with escalating software maintenance costs.
尽早验证设计至关重要。例如,在工作开始三年后发现某个特定想法或整个架构是错误的,这在智力上很有趣,但没有实际价值。理想情况下,在项目开始一周后,您必须知道架构是否站得住脚(或站不住脚)。任何更长时间都存在以可疑架构开始开发的风险。以下章节精确描述了如何验证系统设计。
Early validation of the design is imperative. For example, discovering three years after the work started that a particular idea or the whole architecture was wrong is intellectually interesting but of no practical value. Ideally, one week into the project, you must know if the architecture is going to hold water (or not). Anything longer runs the risk of commencing development with a questionable architecture. The following chapters describe precisely how to validate a system design.
请注意,我这里指的是系统设计、架构,而不是系统的详细设计。详细设计为架构中的每个组件生成关键的实现工件,例如接口、类层次结构和数据契约。详细设计需要更长的时间来生成,可以在项目执行期间完成,并且可能会随着系统的构建或发展而发生变化。
Note that I am referring here to the system design, the architecture, not the detailed design of the system. Detailed design produces for each component in the architecture the key implementation artifacts, such as interfaces, class hierarchies, and data contracts. Detailed design takes longer to produce, can be done during the project execution, and may change as the system is constructed or evolved.
同样,您必须验证您的项目设计。项目中期耗尽时间或超出预算(或两者兼而有之)是绝对不可接受的。未能履行承诺将限制您的职业生涯。您必须主动验证您的项目设计,以确保手头的团队能够交付项目。
Similarly, you must validate your project design. Running out of time or running over budget (or both) mid-project is simply unacceptable. Failing to meet your commitments will limit your career. You must proactively validate your project design to ensure that the team at hand can deliver the project.
除了提供架构和项目计划之外,该方法的目标是消除设计对项目的风险。任何项目都不应该因为架构过于复杂而无法由开发人员构建和维护而失败。该方法可以高效、有效地发现架构,并且只需很短的时间。同样的好处也适用于项目设计。任何项目都不应该因为从一开始就没有足够的时间或资源而失败。本书向您展示了如何准确计算项目工期和成本以及如何做出明智的决策。
In addition to providing the architecture and project plans, the objective of The Method is to remove design as a risk to the project. No project should fail because the architecture was too complex for developers to build and maintain. The Method discovers the architecture efficiently and effectively and does so in a short period of time. The same benefit applies to project design. No project should fail because it did not have enough time or resources from the start. This book shows you how to accurately calculate the project duration and costs and how to drive educated decisions.
使用该方法,您只需几天时间(通常为三到五天)即可完成整个系统设计,项目设计所需的时间也差不多。考虑到这项工作的崇高目标,即为新系统制定系统架构和项目计划,持续时间可能显得太短。典型的业务系统每隔几年才会有一次新设计的选择。为什么不花 10 天时间在架构上呢?与多年的系统寿命相比,多花五天甚至不算什么。但是,增加设计时间通常不会改善结果,甚至可能有害。
Using The Method, you can produce an entire system design in mere days, typically in three to five days, with project design taking similar time. Given the lofty goals of the effort, namely, producing the system architecture and the project plans for a new system, the duration may look too short. Typical business systems get the option of a new design only every few years. Why not spend 10 days on the architecture? Measured against a system lifetime of years, five additional days are not even a rounding error. However, adding design time often does not improve the result and can even be detrimental.
大多数工作环境的时间管理效率极低,这主要是人性使然。时间紧迫迫使你(和其他相关人员)集中精力,确定优先事项,并完成设计。你应该迅速果断地完成方法。
Most work environments have horrendously inefficient time management, mostly due to human nature. A time crunch forces you (and the others involved) to focus, to prioritize, and to produce the design. You should go through The Method quickly and decisively.
一般来说,设计并不耗时(与实施相反)。建筑师按小时收费,通常最多只花一两个星期来设计一栋房子。根据建筑师的设计建造一栋房子可能需要与承包商一起痛苦地工作两到三年,但建筑师很快就完成了建筑设计。
In general, design is not time-consuming (as opposed to implementation). Building architects charge hourly and often work only a week or two at most designing a house. Constructing a house from the architect’s design might take an agonizing two to three years of working with contractors, and yet the architect did not take long to produce the architecture.
时间紧迫也有助于避免设计过度。帕金森定律2指出,工作总是会扩展到填满分配的时间。如果给 10 天时间来完成一个可以在 5 天内完成的设计,架构师很可能会花 10 天时间进行设计。架构师会利用额外的时间来设计一些无关紧要的方面,这些方面只会增加复杂性,从而不成比例地增加未来几年的实施和维护成本。限制设计时间会迫使你制作出足够好的设计。
The time crunch also helps avoid design gold plating. Parkinson’s law2 states that work always expands to fill the allotted time. Given 10 days to complete a design that could be completed in five days, the architect will likely work on the design for 10 days. The architect will use the extra time to design frivolous aspects that add nothing but complexity, disproportionally increasing the cost of implementation and maintenance for years to come. Limiting the design time forces you to produce a good-enough design.
2. Cyril N. Parkinson,《帕金森定律》,《经济学人》 (1955 年 11 月 19 日)。
2. Cyril N. Parkinson, “Parkinson’s Law,” The Economist (November 19, 1955).
分析瘫痪是一种困境,当某个人(或某个团体)原本很有能力、聪明甚至勤奋(大多数软件架构师都是如此)却陷入了看似无休止的分析、设计、新发现和更多分析的循环中时,就会发生这种情况。这个人或团体实际上陷入了瘫痪,无法产生任何有成效的成果。
Analysis-paralysis is a predicament that occurs when someone (or a group) who is otherwise capable, clever, and even hardworking (as are most software architects) is stuck in a seemingly endless cycle of analysis, design, new revelations, and back to more analysis. The person or group is effectively paralyzed and precluded from producing any productive outcome.
瘫痪的主要原因是没有意识到系统和项目的设计决策树。设计决策树是一个通用概念,适用于所有设计任务,而不仅仅是软件工程。任何复杂实体的设计都是许多较小设计决策的集合,这些决策以树状结构分层排列。树中的每个分支都代表一种可能的设计选项,可导致更多更精细的设计决策。树的叶子代表满足需求的完整设计解决方案。每片叶子都是一致、独特且有效的解决方案,在某些方面与所有其他叶子都不同。
The main reason for the paralysis is being unaware of the design decision tree for both the system and the project. The design decision tree is a general concept that applies to all design tasks, not just in software engineering. The design of any complex entity is a collection of many smaller design decisions, arranged hierarchically in a tree-like structure. Each branching in the tree represents a possible design option that leads to additional, finer design decisions. The leaves of the tree represent complete design solutions for the requirements. Each leaf is a consistent, distinct, and valid solution, different in some ways from all other leaves.
当负责设计的个人或团队不知道正确的决策树时,他们会从树根以外的某个地方开始。在某个时刻,下游设计决策将使先前的决策无效;在这两点之间做出的所有决策都将无效。以这种方式进行设计类似于对设计决策树进行冒泡排序。由于冒泡排序涉及的操作数量大致与所涉及元素数量的平方一样多,因此惩罚是严重的。如果不遵循决策树,一个简单的软件系统需要大约 20 个系统和项目设计决策,则可能需要 400 次设计迭代。参加这么多会议(即使您将其分散到一段时间内)会让人陷入瘫痪。甚至不可能有时间进行 40 次迭代。当系统和项目设计工作超出时间时,开发将在系统和项目处于不成熟状态时开始。这会将使设计决策无效的发现推迟到未来更糟糕的时间点,那时时间、精力和工件已经与错误的选择相关联。本质上,您已经最大化了错误设计决策的成本。
When the person or group in charge of producing the design is unaware of the correct decision tree, they start at some place other than the root of the tree. Invariably, at some point, a downstream design decision will invalidate a prior decision; all decisions made in between these two points will be invalid. Designing this way is akin to performing a bubble sort of the design decision tree. Since bubble sort roughly involves as many operations as the square of the number of elements involved, the penalty is severe. A simple software system requiring some 20 system and project design decisions potentially has 400 design iterations if you do not follow the decision tree. Going through so many meetings (even if you spread it over time) is paralysis. Being given the time to perform even 40 iterations is unlikely. When the system and project design effort is out of time, development will commence with the system and the project in an immature state. This defers discoveries that invalidate the design decisions to an even worse point in the future when time, effort, and artifacts already are associated with the incorrect choices. In essence, you have maximized the cost of the incorrect design decision.
事实证明,大多数软件业务系统都有很多共同点,至少决策树的轮廓在这类系统中不仅是共同的,而且是统一的。叶子自然是不同的。
As it turns out, most software business systems have a lot in common, and at least the outline of the decision tree is not only common but also uniform across such systems. The leaves are naturally different.
该方法为系统设计和项目设计提供了典型业务系统的决策树。只有在设计了系统之后设计项目来构建该系统有什么意义吗?这些设计工作(包括系统和项目)中的每一个都有自己的设计决策子树。该方法将指导您完成它,从根开始,避免重复工作和重新评估先前的决策。
The Method provides the decision tree of a typical business system both for the system design and for the project design. Only after you have designed the system is there any point in designing the project to build that system. Each of these design efforts, both the system and the project, has its own subtree of design decisions. The Method guides you through it, starting at the root, avoiding rework and reevaluation of prior decisions.
修剪决策树的最有价值的技术之一是应用约束。正如 Fredrick Brooks 博士3 所指出的,与常识或直觉相反,最糟糕的设计问题是一张干净的画布。没有约束,设计应该很容易,对吗?错了。干净的画布应该让每一位架构师都感到害怕。出错或违背未说明的约束的方式数不胜数。约束越多,设计任务越容易。允许的余地越小,设计就越明显和清晰。在一个完全受约束的系统中,没有什么可设计的:它就是它本身。由于总是存在约束(无论是显式的还是隐式的),通过遵循设计决策树,该方法对系统和项目施加越来越多的约束,直到设计快速收敛和解决。
One of the most valuable techniques in pruning the decision tree is the application of constraints. As pointed by Dr. Fredrick Brooks,3 contrary to common wisdom or intuition, the worst design problem is a clean canvas. Without constraints, the design should be easy, right? Wrong. The clean canvas should terrify every architect. There are infinite ways of getting it wrong or going against unstated constraints. The more constraints there are, the easier the design task is. The less leeway allowed, the more obvious and clear the design. In a totally constrained system, there is nothing to design: it is what it is. Since there are always constraints (whether explicit or implicit), by following the design decision tree, The Method places increasing constraints on the system and the project, to the point that the design converges and resolves quickly.
3. Frederick P. Brooks Jr.,《设计的设计:一位计算机科学家的论文集》(新泽西州上萨德尔河:Addison-Wesley,2010 年)。
3. Frederick P. Brooks Jr., The Design of Design: Essays from a Computer Scientist (Upper Saddle River, NJ: Addison-Wesley, 2010).
该方法的一个重要优势是能够传达设计理念。一旦参与者熟悉了架构的结构和设计语义,该方法就可以共享设计理念并准确传达设计要求。你可以将设计背后的思维过程传达给团队。你应该分享指导你设计架构的权衡和见解,以明确的方式记录操作假设和由此产生的设计决策。
An important advantage of The Method is in communicating design ideas. Once participants are familiar with the structure of the architecture and the design semantics, The Method enables sharing design ideas and precisely conveying what the design requires. You can communicate the thought process behind the design to the team. You should share the tradeoffs and insights that guided you in the architecture, documenting in an unambiguous way the operational assumptions and the resulting design decisions.
设计意图的这种清晰度和透明度对于架构的生存至关重要。好的设计是经过精心构思、在开发过程中得以生存并最终成为客户机器上的工作位。您必须能够将设计传达给开发人员并确保他们重视设计背后的意图和概念。您必须通过使用评审、检查和指导来执行设计。由于结合了定义良好的服务语义和结构,该方法在这种类型的沟通方面表现出色。
This level of clarity and transparency in design intent is critical for architecture survival. A good design is one that was well conceived, survived through development, and ended up as working bits on customer machines. You must be able to communicate the design to the developers and ensure that they value the intent and the concepts behind the design. You must enforce the design by using reviews, inspection, and mentoring. The Method excels at this type of communication because of the combination of well-defined service semantics and structure.
请放心,如果负责构建系统的开发人员不理解和重视设计,他们就会破坏它。无论设计或代码有多少审查永远无法修复这种混乱。审查的目的应该是尽早发现架构中意想不到的偏差。
Rest assured that if the developers who are tasked with building the system do not understand and value the design, they will butcher it. No amount of design or code review can ever fix that butchery. The purpose of reviews should be to catch the unintended deviation from the architecture as early as possible.
在向项目经理、管理人员或其他利益相关者传达项目计划时,情况也是如此。清晰、明确、可比较的选项是做出明智决策的关键。人们做出错误的决定,往往是因为他们不了解项目,对项目的行为方式有错误的思维模型。通过为项目建立正确的时间、成本和风险模型,架构师可以做出正确的决策。该方法提供了正确的词汇和指标,以便以简单明了的方式与决策者沟通。一旦管理人员了解了项目设计的可能性,他们就会成为其最坚定的拥护者,并坚持以这种方式工作。再多的激烈争论也无法达到一组简单的图表和数字所能达到的效果。此外,项目设计不仅在项目开始时很重要。随着工作的开始,您可以使用项目设计的工具向管理层传达变更的效果和可行性。附录 A讨论了项目跟踪和管理变更。
The same holds true when it is time to communicate the project plan to project managers, managers, or other stakeholders. Clear, unambiguous, comparable options are key to educated decisions. When people make the wrong decisions, it is often because they do not understand the project and have the wrong mental model for how projects behave. By producing the correct models for the project across time, cost, and risk, the architect can enable the right decision. The Method provides the right vocabulary and metrics for communicating with decision makers in a simple and concise way. Once managers are exposed to the possibilities of project design, they will become its greatest advocates and insist on working that way. No amount of passionate arguments can accomplish what a simple set of charts and numbers can. Moreover, project design is important not only at the beginning of the project. As the work commences, you can use the tools of project design to communicate to management the effect and viability of changes. Appendix A discusses project tracking and managing changes.
除了向开发商和管理人员传达设计外,该方法还允许建筑师准确、轻松地向其他建筑师传达设计。通过这种方式从审查和批评中获得的见解非常宝贵。
Besides communicating the design to developers and managers, The Method allows the architect to accurately and easily communicate the design to other architects. The insights you gain from review and criticism in this manner are invaluable.
布鲁克斯在 1987 年写道:“没有灵丹妙药。” 4当然,方法论不是灵丹妙药。使用方法论并不能保证成功,如果与项目中的其他内容分开使用或仅仅为了使用而使用方法论,可能会使情况变得更糟。
Brooks wrote in 1987, “There is no silver bullet.”4 Certainly, The Method is not one. Using The Method does not guarantee success and may make matters worse if used in isolation from anything else in the project or just for the sake of using it.
4. Frederick P. Brooks Jr.,“没有灵丹妙药:软件工程的本质和意外”,计算机 20,第 4 期(1987 年 4 月)。
4. Frederick P. Brooks Jr., “No Silver Bullet: Essence and Accidents of Software Engineering,” Computer 20, no. 4 (April 1987).
该方法不会剥夺架构师在创建正确架构方面的创造力和努力。架构师仍然负责提炼系统所需的行为。面对越来越大的压力,架构师仍然要对架构错误、未能将设计传达给开发人员或未能在不损害架构的情况下领导开发工作直至交付负责。此外,正如本书第二部分所述,架构师必须根据架构制定可行的项目设计。架构师必须根据可用资源校准项目,以资源可以生产什么、涉及的风险以及最后期限。为了项目本身的利益而走走项目设计的道路是没有意义的。建筑师必须消除任何偏见,并制定正确的规划假设和结果计算。
The Method does not take away the architect’s creativity and effort in producing the right architecture. The architect is still responsible for distilling the required behavior of the system. The architect is still liable for getting the architecture wrong or for failing to communicate the design to developers or for failing to lead the development effort until delivery without compromising the architecture, all in the face of mounting pressure. Furthermore, as illustrated in the second part of this book, the architect must produce a viable project design, stemming out of the architecture. The architect must calibrate the project to the available resources, to what the resources can produce, to the risks involved, and to the deadline. Going through the motions of project design for their own sake is pointless. The architect must eliminate any bias and produce the correct set of planning assumption and resulting calculations.
该方法为系统和项目设计提供了一个很好的起点,并列出了要避免的事情。但是,只有当你诚实地做事,并投入时间和精力来收集所需的信息时,该方法才会有效。你必须从根本上关心设计过程及其产生的结果。
The Method provides a good starting point for system and project design, along with a list of the things to avoid. However, The Method works only as long as you do it truthfully while devoting the time and mental energy to gather the required information. You must fundamentally care about the design process and what it produces.
软件架构是软件系统的高级设计和结构。虽然与构建系统相比,设计系统既快捷又便宜,但确保架构正确至关重要。系统构建完成后,如果架构存在缺陷、错误或无法满足您的需求,则维护或扩展系统的成本将非常高昂。
Software architecture is the high-level design and structure of the software system. While designing the system is quick and inexpensive compared with building the system, it is critical to get the architecture right. Once the system is built, if the architecture is defective, wrong, or just inadequate for your needs, it is extremely expensive to maintain or extend the system.
任何系统架构的本质都是将整个系统的概念分解成其组成组件,无论是汽车、房屋、笔记本电脑还是软件系统。良好的架构还规定了这些组件在运行时如何交互。识别系统组成组件的行为称为系统分解。
The essence of the architecture of any system is the breakdown of the concept of the system as a whole into its comprising components, be it a car, a house, a laptop, or a software system. A good architecture also prescribes how these components interact at run-time. The act of identifying the constituent components of a system is called system decomposition.
正确的分解至关重要。错误的分解意味着错误的架构,而这反过来会在未来带来可怕的痛苦,通常会导致系统完全重写。
The correct decomposition is critical. A wrong decomposition means wrong architecture, which in turn inflicts a horrendous pain in the future, often leading to a complete rewrite of the system.
在过去,这些构建块是 C++ 对象,后来是 COM、Java 或 .NET 组件。在现代系统和本书中,服务(如面向服务)是架构中最精细的单元。但是,用于实现组件的技术及其细节(如接口、操作和类层次结构)是详细的设计方面,而不是系统分解。事实上,这些细节可以改变,而不会影响分解,因此也不会影响架构。
In years past, these building blocks were C++ objects and later COM, Java, or .NET components. In a modern system and in this book, services (as in service-orientation) are the most granular unit of the architecture. However, the technology used to implement the components and their details (such as interfaces, operations, and class hierarchies) are detailed design aspects, not system decomposition. In fact, such details can change without ever affecting the decomposition and therefore the architecture.
不幸的是,大多数(如果不是绝大多数)软件系统的设计都不正确,可以说是以最糟糕的方式设计的。设计缺陷是系统分解不正确的直接结果。因此,本章首先解释为什么常见的分解方式存在根本缺陷,然后讨论方法分解方法背后的原理。您还将看到一些在设计系统时可以利用的强大且有用的技术。
Unfortunately, the majority, if not the vast majority, of all software systems are not designed correctly and arguably are designed in the worst possible way. The design flaws are a direct result of the incorrect decomposition of the systems. This chapter therefore starts by explaining why the common ways of decomposition are flawed to the core and then discusses the rationale behind The Method’s decomposition approach. You will also see some powerful and helpful techniques to leverage when designing the system.
功能分解根据系统的功能将系统分解为构建块。例如,如果系统需要执行一组操作,如开票、计费和运输,则最终会得到Invoicing服务、Billing服务和服务Shipping。
Functional decomposition decomposes a system into its building blocks based on the functionality of the system. For example, if the system needs to perform a set of operations, such as invoicing, billing, and shipping, you end up with the Invoicing service, the Billing service, and the Shipping service.
功能分解的问题很多而且很严重。至少,功能分解将服务与需求耦合在一起,因为服务是需求的反映。所需功能的任何变化都会对功能服务产生影响。随着时间的推移,这种变化是不可避免的,并且会要求事后进行新的分解以反映新的需求,从而对系统造成痛苦的未来变化。除了对系统进行昂贵的更改之外,功能分解还会妨碍重用并导致系统和客户端过于复杂。
The problems with functional decomposition are many and acute. At the very least, functional decomposition couples services to the requirements because the services are a reflection of the requirements. Any change in the required functionality imposes a change on the functional services. Such changes are inevitable over time and impose a painful future change to your system by requiring a new decomposition after the fact to reflect the new requirements. In addition to costly changes to the system, functional decomposition precludes reuse and leads to overly complex systems and clients.
A考虑一个使用三个服务、B和 的简单功能分解系统,C它们的调用顺序为A然后B然后C。由于功能分解也是基于时间的分解(调用A然后 调用B),它有效地排除了服务的单独重用。假设另一个系统也需要一个B服务(例如Billing)。 的结构中内置了B这样的概念,即它是在 之后A并在服务之前调用的C(例如首先Invoicing,然后才Billing针对发票,最后Shipping)。任何试图B从第一个系统提取服务并将其放入第二个系统的尝试都将失败,因为在第二个系统中,没有人在A它之前和C之后执行此操作。当您提取B服务时,A和C服务都悬挂在它上面。B根本不是独立的可重用服务A—— 、B和C是一群紧密耦合的服务。
Consider a simple functionally decomposed system that uses three services A, B, and C, which are called in the order of A then B then C. Because functional decomposition is also decomposition based on time (call A and then call B), it effectively precludes individual reuse of services. Suppose another system also needs a B service (such as Billing). Built into the fabric of B is the notion that it was called after an A and before a C service (such as first Invoicing, and only then Billing against an invoice, and finally Shipping). Any attempt to lift the B service from the first system and drop it in the second system will fail because, in the second system, no one is doing A before it and C after it. When you lift the B service, the A and the C services are hanging off it. B is not an independent reusable service at all—A, B, and C are a clique of tightly coupled services.
执行功能分解的一种方法是,拥有与功能变体数量相同的服务。这种分解会导致服务数量激增,因为一个规模相当大的系统可能拥有数百种功能。您不仅拥有太多服务,而且这些服务通常会重复许多通用功能,每个服务都根据其情况进行定制。服务数量激增会导致集成和测试成本过高,并增加整体复杂性。
One way of performing functional decomposition is to have as many services as there are variations of the functionalities. This decomposition leads to an explosion of services, since a decently sized system may have hundreds of functionalities. Not only do you have too many services, but these services often duplicate a lot of the common functionality, each customized to their case. The explosion of services inflicts a disproportional cost in integration and testing and increases overall complexity.
另一种功能分解方法是将执行操作的所有可能方式集中到大型服务中。这会导致服务规模膨胀,使其过于复杂且无法维护。这种庞大的巨无霸成为原始功能所有相关变体的丑陋垃圾场,服务内部和服务之间关系错综复杂。
Another functional decomposition approach is to lump all possible ways of performing the operations into mega services. This leads to bloating in the size of the services, making them overly complex and impossible to maintain. Such god monoliths become ugly dumping grounds for all related variations of the original functionality, with intricate relationships inside and between the services.
因此,功能分解往往会导致服务要么太大而太少,要么太小而太多。您经常会在同一个系统中同时看到这两种情况。
Functional decomposition, therefore, tends to make services either too big and too few or too small and too many. You often see both afflictions side by side in the same system.
功能分解通常会导致系统层次结构扁平化。由于每个服务或构建块都专用于特定功能,因此必须有人将这些离散的功能组合成所需的行为。这个人通常是客户。当客户是编排服务的人时,系统将变成一个扁平的两层系统:客户端和服务,任何额外分层的概念都消失了。假设您的系统需要按顺序执行三个操作(或功能):和A。如图 2-1所示,客户端必须将服务拼接在一起。BC
Functional decomposition often leads to flattening of the system hierarchy. Since each service or building block is devoted to a specific functionality, someone must combine these discrete functionalities into a required behavior. That someone is often the client. When the client is the one orchestrating the services, the system becomes a flat two-tier system: clients and services, and any notion of additional layering is gone. Suppose your system needs to perform three operations (or functionalities): A, B and C, in that order. As illustrated in Figure 2-1, the client must stitch the services together.
图 2-1臃肿的客户端编排功能
Figure 2-1 Bloated client orchestrating functionality
通过在客户端中添加编排逻辑,您会用系统的业务逻辑污染客户端代码。客户端不再只是调用系统上的操作或向用户呈现信息。客户端现在非常了解所有内部服务,如何调用它们,如何处理它们的错误,如何B在 成功之后补偿 的失败,A等等。调用服务几乎总是同步的,因为客户端按照预期的顺序进行,A然后,否则很难在保持对外界响应的同时确保调用的顺序。此外,客户端现在与所需的功能耦合在一起。操作中的任何更改(例如调用' 而不是)都会迫使客户端反映该更改。糟糕设计的标志是系统的任何更改都会影响客户端。理想情况下,客户端和服务应该能够独立发展。几十年前,软件工程师发现将业务逻辑包含在客户端中是个坏主意。然而,当按照图 2-1所示进行设计时,您不得不用排序、排序、错误补偿和调用持续时间的业务逻辑来污染客户端。最终,客户端不再是客户端 - 它已经成为系统。BCBB
By bloating the client with the orchestration logic, you pollute the client code with the business logic of the system. The client is no longer just about invoking operations on the system or presenting information to users. The client is now intimately aware of all internal services, how to call them, how to handle their errors, how to compensate for the failure of B after the success of A, and so on. Calling the services is almost always synchronous because the client proceeds along the expected sequence of A then B then C, and it is difficult otherwise to ensure the order of the calls while remaining responsive to the outside world. Furthermore, the client is now coupled to the required functionality. Any change in the operations, such as calling B' instead of B, forces the client to reflect that change. The hallmark of a bad design is when any change to the system affects the client. Ideally, the client and services should be able to evolve independently. Decades ago, software engineers discovered that it was a bad idea to include business logic with the client. Yet, when designed as in Figure 2-1, you are forced to pollute the client with the business logic of sequencing, ordering, error compensation, and duration of the calls. Ultimately, the client is no longer the client—it has become the system.
如果有多个客户端(例如,富客户端、网页、移动设备),每个客户端都试图调用相同的功能服务序列,该怎么办?您注定要在客户端之间复制该逻辑,这使得维护所有这些客户端既浪费又昂贵。随着功能的变化,您现在被迫在多个客户端上跟上这种变化,因为所有客户端都会受到影响。通常,一旦出现这种情况,开发人员就会试图避免对服务的功能进行任何更改,因为这会对客户端产生连锁反应。由于客户端众多,每个客户端都有自己定制的排序版本,因此更改或交换服务变得更加困难,从而阻止在客户端之间重用相同的行为。实际上,您最终需要维护多个复杂的系统,试图使它们保持同步。最终,当更改在开发和生产过程中被迫进行时,这既会扼杀创新,又会延长上市时间。
What if there are multiple clients (e.g., rich clients, web pages, mobile devices), each trying to invoke the same sequence of functional services? You are destined to duplicate that logic across the clients, making maintenance of all those clients wasteful and expensive. As the functionality changes, you now are forced to keep up with that change across multiple clients, since all of them will be affected. Often, once that is the case, developers try to avoid any changes to the functionality of the services because of the cascading effect it will have on the clients. With the multiplicity of clients, each with its own version of the sequencing tailored to its needs, it becomes even more challenging to change or interchange services, thus precluding reuse of the same behavior across the clients. Effectively, you end up maintaining multiple complex systems, trying to keep them all in sync. Ultimately, this leads to both stifling of innovation and increased time to market when the changes are forced through development and production.
作为迄今为止讨论的功能分解问题的一个例子,请考虑图 2-2。这是我审查的一个系统的圈复杂度分析的可视化。使用的设计方法是功能分解。
As an example of the problems with functional decomposition discussed thus far, consider Figure 2-2. It is the visualization of cyclomatic complexity analysis of a system I reviewed. The design methodology used was functional decomposition.
图 2-2功能设计的复杂性分析
Figure 2-2 Complexity analysis of a functional design
圈复杂度度量的是类或服务代码中独立路径的数量。内部越复杂和耦合,圈复杂度得分就越高。用于生成图 2-2的工具对系统中的各个类进行了测量和评级。在可视化中,类越复杂,其大小和颜色就越深。乍一看,你会看到三个非常大且非常复杂的类。维护 有多容易?这仅仅是一个表单、一个 UI 元素、一个从用户到系统的干净管道,还是它就是系统?观察 的大小和阴影中MainForm设置 所需的复杂性。不甘示弱,非常复杂,因为更改 中使用的资源非常复杂。MainFormFormSetupResourcesMainForm理想情况下,Resources应该是微不足道的,由简单的图像和字符串列表组成。系统的其余部分由数十个小的、简单的类组成,每个类都专用于特定的功能。较小的类实际上处于三个大类的阴影之下。然而,虽然每个小类可能都很微不足道,但小类的数量本身就是一个复杂性问题,涉及这么多类之间的复杂集成。结果是组件太多,组件太大,客户端臃肿。
Cyclomatic complexity measures the number of independent paths through the code of a class or service. The more the internals are convoluted and coupled, the higher the cyclomatic complexity score. The tool used to generate Figure 2-2 measured and rated the various classes in the system. In the visualization, the more complex the class is, the larger and darker it is in color. At first glance, you see three very large and very complex classes. How easy would it be to maintain MainForm? Is this just a form, a UI element, a clean conduit from the user to the system, or is it the system? Observe the complexity required to set up MainForm in the size and shade of FormSetup. Not to be outdone, Resources is very complex, since it is very complex to change the resources used in MainForm. Ideally, Resources should have been trivial, comprising simple lists of images and strings. The rest of the system is made up of dozens of small, simple classes, each devoted to a particular functionality. The smaller classes are literally in the shadow of the three massive ones. However, while each of the small classes may be trivial, the sheer number of the smaller classes is a complexity issue all on its own, involving intricate integration across that many classes. The result is both too many components and too big components as well as a bloated client.
图 2-1的分解的另一个问题是它需要多个系统入口点。客户端(或多个客户端)需要在三个地方进入系统:一次进入A,然后进入B,然后进入C服务。这意味着有多个地方需要担心身份验证、授权、可伸缩性、实例管理、事务传播、身份、托管等。当您需要更改执行其中任何一个方面的方式时,您需要在服务和客户端的多个位置进行更改。随着时间的推移,这些多次更改使得添加新的和不同的客户端非常昂贵。
Another problem with the decomposition of Figure 2-1 is that it requires multiple points of entry to the system. The client (or clients) needs to enter the system in three places: once for the A, then for the B, then for the C service. This means there are multiple places to worry about authentication, authorization, scalability, instance management, transaction propagation, identities, hosting, and so on. When you need to change the way you perform any one of these aspects, you will need to change it in multiple places across services and clients. Over time, these multiple changes make adding new and different clients very expensive.
作为图 2-1中对功能服务进行排序的替代方法,您可以选择让功能服务相互调用,从表面上看,这种方法似乎两害相权取其轻,如图 2-3所示。
As an alternative to sequencing the functional services as in Figure 2-1, you can opt for what, on the face of it, appears as a lesser evil by having the functional services call each other, as shown in Figure 2-3.
图 2-3链接功能服务
Figure 2-3 Chaining functional services
这样做的好处是可以让客户端保持简单,甚至是异步的:客户端向A服务发出调用。A然后服务调用B,并B调用C。
The advantage of doing so is that you get to keep the clients simple and even asynchronous: the clients issue the call to the A service. The A service then calls B, and B calls C.
现在的问题是功能服务彼此耦合,并且与功能调用顺序耦合。例如,您Billing只能在服务之后Invoicing但在服务之前调用服务。在图 2-3Shipping的情况下,服务内置了需要调用 B 服务的知识。只能在服务之后但在服务之前调用服务。调用所需顺序的变化可能会影响链上上下下的所有服务,因为它们的实现必须更改以反映新的所需顺序。ABAC
The problem now is that the functional services are coupled to each other and to the order of the functional calls. For example, you can call the Billing service only after the Invoicing service but before the Shipping service. In the case of Figure 2-3, built into the A service is the knowledge that it needs to call the B service. The B service can be called only after the A service and before the C service. A change in the required ordering of the calls is likely to affect all services up and down the chain because their implementation will have to change to reflect the new required order.
但是图 2-3并未展示全貌。图 2-3B中的服务与图 2-1中的服务截然不同。原始服务仅执行功能。图 2-3中的服务必须了解服务,并且契约必须包含服务执行其功能所需的参数。这些细节是图 2-1中的客户端的责任。问题因服务而变得更加复杂,服务现在必须在其服务契约中容纳调用和服务所需的参数,以使它们执行各自的业务功能。对和功能的任何更改都会反映在服务实现的更改中,而服务实现现在与它们耦合。图 2-4描述了这种膨胀和耦合。BBBCBCABCBCA
But Figure 2-3 does not reveal the full picture. The B service of Figure 2-3 is drastically different from that of Figure 2-1. The original B service performed only the B functionality. The B service in Figure 2-3 must be aware of the C service, and the B contract must contain the parameters that will be required by the C service to perform its functionality. These details were the responsibility of the client in Figure 2-1. The problem is compounded by the A service, which must now accommodate in its service contract the parameters required for calling the B and the C services for them to perform their respective business functionality. Any change to the B and C functionality is reflected in a change to the implementation of the A service, which is now coupled to them. This kind of bloating and coupling is depicted in Figure 2-4.
图 2-4链接功能导致服务臃肿。
Figure 2-4 Chaining functionality leads to bloated services.
遗憾的是,即使图 2-4也没有说明全部事实。假设A服务成功执行了A功能,然后继续调用B服务来执行该B功能。B但是,服务遇到了错误,无法正确执行。如果同步A调用B,则A必须密切了解的内部逻辑和状态B才能恢复其错误。这意味着B功能也必须驻留在A服务中。如果异步A调用B,那么B服务现在必须以某种方式返回到A服务并撤消A功能或在其自身内包含回滚A。换句话说,A功能也驻留在服务中。这会导致服务与服务B之间紧密耦合,并导致服务膨胀,需要补偿服务的成功。这种情况如图 2-5所示。BABA
Sadly, even Figure 2-4 does not tell the whole truth. Suppose the A service performed the A functionality successfully and then proceeded to calling the B service to perform the B functionality. The B service, however, encountered an error and failed to execute properly. If A called B synchronously, then A must be intimately aware of the internal logic and state of B in order to recover its error. This means the B functionality must also reside in the A service. If A called B asynchronously, then the B service must now somehow reach back to the A service and undo the A functionality or contain the rollback of A within itself. In other words, the A functionality also resides in the B service. This creates tight coupling between the B service and the A service and bloats the B service with the need to compensate for the success of the A service. This situation is shown in Figure 2-5.
图 2-5补偿引起的额外膨胀和耦合
Figure 2-5 Additional bloating and coupling due to compensation
问题在C服务中变得更加复杂。如果A和B功能都成功完成,但C服务未能执行其业务功能,会怎么样?C服务必须返回到B和A服务来撤消它们的操作。这会导致服务更加臃肿,并将其与和服务C耦合在一起。考虑到图 2-5中的耦合和臃肿,如何才能将该服务替换为功能不同于功能的'服务?这会对和服务产生什么不利影响?同样,当在其他上下文中需要服务中的功能时(例如在服务之后和服务之前调用服务),图 2-5中的重用程度如何?、和是三个不同的服务还是一个混乱的混合体?ABBBBACBDEABC
The issue is compounded in the C service. What if both the A and B functionalities succeeded and completed, but the C service failed to perform its business function? The C service must reach back to both the B and the A services to undo their operations. This creates far more bloating in the C service and couples it to the A and B services. Given the coupling and bloating in Figure 2-5, what will it take to replace the B service with a B' service that performs the functionality differently than B? What will be the adverse effects on the A and C services? Again, what degree of reuse exists in Figure 2-5 when the functionality in the services is asked for in other contexts, such as calling the B service after the D service and before the E service? Are A, B, and C three distinct services or just one fused mess?
功能分解具有几乎不可抗拒的吸引力。它看起来是一种简单明了的系统设计方法,只需要你简单地列出所需的功能,然后在架构中为每个功能创建一个组件。功能分解(及其同类,稍后讨论的域分解)是大多数系统的设计方式。大多数人自然而然地选择功能分解,这很可能是你的计算机科学教授在学校向你展示的。功能分解在设计不良的系统中盛行,这几乎完美地表明了应该避免的事情。无论如何,你必须抵制功能分解的诱惑。
Functional decomposition holds an almost irresistible allure. It looks like a simple and clear way of designing the system, requiring you to simply list the required functionalities and then create a component in your architecture for each. Functional decomposition (and its kin, the domain decomposition discussed later) is how most systems are designed. Most people choose functional decomposition naturally, and it is likely what your computer science professor showed you in school. The prevalence of functional decomposition in poorly designed systems makes a near-perfect indicator of something to avoid. At all costs, you must resist the temptations of functional decomposition.
无需使用任何软件工程论据,你就可以证明功能分解永远无法发挥作用。这一证明与宇宙的本质有关,具体来说,就是热力学第一定律。抛开数学,热力学第一定律简单地说,不付出努力就无法创造价值。俗话说:“天下没有免费的午餐。”
You can prove that functional decomposition is precluded from ever working without using a single software engineering argument. The proof has to do with the very nature of the universe, specifically, the first law of thermodynamics. Stripping away the math, the first law of thermodynamics simply states that you cannot add value without sweating. A colloquial way of saying the same is: “There ain’t no such thing as a free lunch.”
设计本质上是一种高附加值活动。你之所以读这本书而不是其他编程书籍,是因为你重视设计,或者换句话说,你认为设计增加了价值,甚至带来了很多价值。
Design, by its very nature, is a high-added-value activity. You are reading this book instead of yet another programming book because you value design, or put differently, you think design adds value, or even a lot of value.
功能分解的问题在于它试图欺骗热力学第一定律。功能分解的结果,即系统设计,应该是一项高附加值活动。然而,功能分解简单而直接:给定一组要求执行A、B和C功能的需求,您可以将其分解为A、B和C服务。“不费吹灰之力!”您会说。“功能分解非常简单,一个工具就可以完成。”然而,正是因为它是一种快速、简单、机械和直接的设计,它也表现出与热力学第一定律的矛盾。既然不费吹灰之力就无法增加价值,那么功能分解如此吸引人的属性正是那些阻碍功能分解增加价值的属性。
The problem with functional decomposition is that it endeavors to cheat the first law of thermodynamics. The outcome of a functional decomposition, namely, system design, should be a high-added-value activity. However, functional decomposition is easy and straightforward: given a set of requirements that call for performing the A, B, and C functionalities, you decompose into the A, B, and C services. “No sweat!” you say. “Functional decomposition is so easy that a tool could do it.” However, precisely because it is a fast, easy, mechanistic, and straightforward design, it also manifests a contradiction to the first law of thermodynamics. Since you cannot add value without effort, the very attributes that make functional decomposition so appealing are those that preclude functional decomposition from adding value.
说服同事和经理们做除功能分解以外的任何事情都将是一场艰苦的斗争。他们会说:“我们一直都是这样做的。”有两种方法可以反驳这种说法。第一种是回答:“我们有多少次按时完成任务或按预算完成任务了?我们的质量和复杂性如何?维护系统有多容易?”
It will be an uphill struggle to convince colleagues and managers to do anything other than functional decomposition. “We have always done it that way,” they will say. There are two ways to counter that argument. The first is replying, “And how many times have we met the deadline or the budget to which we committed? What were our quality and complexity like? How easy was it to maintain the system?”
第二种是进行反设计。告诉团队你正在为下一代系统进行设计竞赛。将团队分成两半,让他们分别在不同的会议室里进行讨论。让前半组人提出系统的最佳设计。让后半组人提出最糟糕的设计:一种会最大程度地降低系统扩展和维护能力的设计,一种不允许重用的设计,等等。让他们花一个下午的时间研究这个问题,然后把他们聚在一起。当你比较结果时,你通常会发现他们提出了相同的设计。组件上的标签可能不同,但设计的本质是相同的。现在才承认他们不是在研究同一个问题,并讨论其影响。也许这次需要一种不同的方法。
The second is to perform an anti-design effort. Inform the team that you are conducting a design contest for the next-generation system. Split the team into halves, each in a separate conference room. Ask the first half to produce the best design for the system. Ask the second half to produce the worst possible design: a design that will maximize your inability to extend and maintain the system, a design that will disallow reuse, and so on. Let them work on it for one afternoon and then bring them together. When you compare the results, you will usually see they have produced the same design. The labels on the components may differ, but the essence of the design will be the same. Only now confess that they were not working on the same problem and discuss the implications. Perhaps a different approach is called for this time.
事实上,永远不要使用功能分解进行设计,这是一个普遍的看法,与软件系统无关。考虑从功能上建造一栋房子,就好像它是一个软件系统。首先列出房子所需的所有功能,例如烹饪、玩耍、休息、睡觉等。然后,在架构中为每个功能创建一个实际组件,如图2-6所示。
The fact you should never design using functional decomposition is a universal observation that has nothing to do with software systems. Consider building a house functionally, as if it were a software system. You start by listing all the required functionalities of the house, such as cooking, playing, resting, sleeping, and so on. You then create an actual component in the architecture for each functionality, as shown in Figure 2-6.
图 2-6房屋的功能分解
Figure 2-6 Functional decomposition of a house
虽然图 2-6已经很荒谬了,但真正的疯狂只有在建造这栋房子的时候才会显现出来。你从一块干净的土地开始,建造厨房。只是做饭。你从盒子里拿出微波炉放在一边。倒上一个小混凝土垫,在垫子上建一个木框架,盖上台面,把微波炉放在上面。为微波炉建一个小食品储藏室,在上面用锤子敲一个小屋顶,只把微波炉连接到电网上。“我们做饭了!”你向老板和顾客宣布。
While Figure 2-6 is already preposterous, the true insanity becomes evident only when it is time to build this house. You start with a clean plot of land and build cooking. Just cooking. You take a microwave oven out of its box and put it aside. Pour a small concrete pad, build a wood frame on the pad, cover it with countertop, and place the microwave on it. Build a small pantry for the microwave and hammer a tiny roof over it, connect just the microwave to the power grid. “We have cooking!” you announce to the boss and customers.
但烹饪真的完成了吗?烹饪能以这种方式完成吗?你在哪里上菜、储存剩菜或处理垃圾?在煤气炉上做饭怎么样?如何才能复制这种烹饪方式?炉子?在两种不同的烹饪功能表达方式之间,你能实现多大程度的重用?你可以轻松地扩展其中任何一种吗?在其他地方用微波炉做饭怎么样?重新安置微波炉需要做些什么?所有这些乱象甚至还不是开始,因为这完全取决于你进行的烹饪类型。如果烹饪涉及多种器具且因情况而异,你可能需要构建单独的烹饪功能 — — 例如,如果你要煮早餐、午餐、晚餐、甜点或小吃。你最终要么得到大量微小烹饪服务,每种服务都专用于必须提前知道的特定场景,要么得到拥有一切的大规模烹饪服务。你会建这样的房子吗?如果没有,为什么要以这种方式设计和构建软件系统?
But is cooking really done? Can cooking ever be done this way? Where are you serving the meal, storing the leftovers, or disposing of trash? What about cooking over the gas stove? What will it take to duplicate this feat for cooking over the stove? What degree of reuse can you have between the two separate ways of expressing the functionality of cooking? Can you extend any one of them easily? What about cooking with a microwave somewhere else? What does it take to relocate the microwave? All of this mess is not even the beginning because it all depends on the type of cooking you perform. You need to build separate cooking functionality, perhaps, if cooking involves multiple appliances and differs by context—for example, if you are cooking breakfast, lunch, dinner, dessert, or snacks. You end up with either explosion of minute cooking services, each dedicated to a specific scenario that must be known in advance, or you end up with massive cooking service that has it all. Will you ever build a house like that? If not, why design and build a software system that way?
图 2-6中的房屋设计显然是荒谬的。在你的房子里,你很可能在厨房做饭,所以图 2-7显示了房屋的另一种分解方式。这种分解形式称为域分解:根据业务领域(如销售、工程、会计和运输)将系统分解为构建块。遗憾的是,图 2-7所示的域分解甚至比图 2-6的功能分解还要糟糕。域分解不起作用的原因是它仍然是伪装的功能分解:Kitchen是你做饭的地方,Bedroom是你睡觉的地方,Garage是你停车的地方,等等。
The house design in Figure 2-6 is obviously absurd. In your house, you likely do the cooking in the kitchen, so an alternative decomposition of the house is shown in Figure 2-7. This form of decomposition is called domain decomposition: decomposing a system into building blocks based on the business domains, such as sales, engineering, accounting, and shipping. Sadly, domain decomposition such as Figure 2-7 shows is even worse than the functional decomposition of Figure 2-6. The reason domain decomposition does not work is that it is still functional decomposition in disguise: Kitchen is where you do the cooking, Bedroom is where you do the sleeping, Garage is where you do the parking, and so on.
图 2-7房屋的域分解
Figure 2-7 Domain decomposition of a house
事实上,图 2-6中的每个功能区域都可以映射到图 2-7中的域,这会带来严重的问题。虽然每个卧室可能都是独一无二的,但你必须在所有卧室中复制睡眠功能。当在客厅的电视机前睡觉或在厨房招待客人时(因为几乎所有的家庭聚会都在厨房举行),就会发生进一步的重复。每个域通常都会演变成丑陋的功能大杂烩,从而增加了域的内部复杂性。增加的内部复杂性使您避免了跨域连接的痛苦,跨域通信通常简化为简单的状态更改(类似 CRUD),而不是触发涉及所有域的必需行为执行的操作。跨域组合更复杂的行为非常困难。在这样的域分解中,某些功能根本无法实现。例如,在图 2-7中的房子里,您会在哪里进行无法在厨房进行的烹饪(例如,烧烤)?
In fact, every one of the functional areas of Figure 2-6 can be mapped to domains in Figure 2-7, which presents severe problems. While each bedroom may be unique, you must duplicate the functionality of sleeping in all of them. Further duplication occurs when sleeping in front of the TV in the living room or when entertaining guests in the kitchen (as almost all house parties end up in the kitchen). Each domain often devolves into an ugly grab bag of functionality, increasing the internal complexity of the domain. The increased inner complexity causes you to avoid the pain of cross-domain connectivity, and communication across domains is typically reduced to simple state changes (CRUD-like) rather than actions triggering required behavior execution involving all domains. Composing more complex behaviors across domains is very difficult. Some functionalities are simply impossible in such domain decompositions. For example, in the house in Figure 2-7, where would you perform cooking that cannot take place in the kitchen (e.g., a barbecue)?
与纯函数式方法一样,域分解的真正问题在构建过程中变得明显。想象一下按照图 2-7的分解建造一栋房子。你从一块干净的土地开始。你为厨房挖一条沟渠作为地基,为地基浇注混凝土(仅用于厨房),并在混凝土中添加螺栓。然后,你竖起厨房墙壁(所有墙壁都必须是外墙);将它们用螺栓固定在地基上;在墙壁内穿电线和管道;将厨房连接到水、电和煤气供应;将厨房连接到下水道排放口;添加加热和冷却管道和通风口;将厨房连接到炉子;添加水表、电表和煤气表;在厨房上方搭建屋顶;在内部拧上石膏板;挂上橱柜;用灰泥涂抹外墙(所有墙壁);然后粉刷。你向客户宣布Kitchen已完成并已达到里程碑 1.0。
As with the pure functional approach, the real problems with domain decomposition become evident during construction. Imagine building a house along the decomposition of Figure 2-7. You start with a clean plot of land. You dig a trench for the foundation for the kitchen, pour concrete for the foundation (just for the kitchen), and add bolts in the concrete. You then erect the kitchen walls (all have to be exterior walls); bolt them to the foundation; run electrical wires and plumbing in the walls; connect the kitchen to the water, power, and gas supplies; connect the kitchen to the sewer discharge; add heating and cooling ducts and vents; connect the kitchen to a furnace; add water, power, and gas meters; build a roof over the kitchen; screw drywall on the inside; hang cabinets; coat the outside walls (all walls) with stucco; and paint it. You announce to the customer that the Kitchen is done and that milestone 1.0 is met.
然后你去卧室。你首先把厨房墙上的灰泥敲掉,露出连接墙壁和地基的螺栓,然后把厨房的螺栓卸下来从地基上拆除厨房。你断开厨房的电源、煤气、水源和下水道排水,然后使用昂贵的液压千斤顶将厨房抬起。在将厨房悬空的同时,你将其移到一边,这样你就可以用手提钻拆除厨房的地基,将碎屑运走,并支付昂贵的倾倒费。现在,你可以挖一条新的沟槽,其中包含卧室和厨房的连续地基。你将混凝土倒入沟槽中以铸造新地基,并添加螺栓,希望螺栓的位置与之前完全相同。接下来,你非常小心地将厨房放回新地基的顶部,确保所有螺栓孔对齐(这几乎是不可能的)。你为卧室建造了新的墙壁。你暂时将橱柜从厨房墙壁上移开;拆除石膏板以露出内部电线、管道和管道;并将管道、水管和电线连接到卧室的管道、水管和电线。你在厨房和卧室里加了石膏板,重新挂上厨房橱柜,并在卧室里加了壁橱。你把厨房墙上剩余的灰泥全部打掉,这样就可以在外墙上涂抹连续、无裂缝的灰泥。现在你必须将厨房以前的几面外墙改建成内墙,这涉及到灰泥、隔热层、油漆等。你拆除了厨房的屋顶,在卧室和厨房上方建造了一个新的连续屋顶。你向客户宣布里程碑 2.0 已经实现,一切Bedroom 1就绪。
Then you move on to the bedroom. You first bust the stucco off the kitchen walls to expose the bolts connecting the walls to the foundation and unbolt the kitchen from the foundation. You disconnect the kitchen from the power supply, gas supply, water supply, and sewer discharge and then use expensive hydraulic jacks to lift the kitchen. While suspending the kitchen in midair, you shift it to the side so that you can demolish the foundation for the kitchen with jackhammers, hauling the debris away and paying expensive dump fees. Now you can dig a new trench that will contain a continuous foundation for the bedroom and the kitchen. You pour concrete into the trenches to cast the new foundation and add the bolts hopefully at exactly the same spots as before. Next, you very carefully lower the kitchen back on top of the new foundation, making sure all the bolt holes align (this is next to impossible). You erect new walls for the bedroom. You temporarily remove the cabinets from the kitchen walls; remove the drywall to expose the inner electrical wires, pipes, and ducts; and connect the ducts, plumbing, and wires to those of the bedroom. You add drywall in the kitchen and the bedroom, rehang the kitchen cabinets, and add closets in the bedroom. You knock down any remaining stucco from the walls of the kitchen so that you can apply continuous, crack-free stucco on the outside walls. You must convert several of the previous outside walls of the kitchen to internal walls now, with implications on stucco, insulation, paint, and so on. You remove the roof of the kitchen and build a new continuous roof over the bedroom and the kitchen. You announce to the customer that milestone 2.0 is met, and Bedroom 1 is done.
你必须重建厨房的事实并未透露。第二次建造厨房比第一次要昂贵得多且风险更大这一事实也未透露。要在这个房子里再加一间卧室需要花费多少钱?你最终要建造和拆除厨房多少次?在厨房坍塌成一堆无用的碎片之前,你实际上可以重建厨房多少次?当你宣布厨房完工时,它真的完工了吗?除了返工罚款之外,房子的各个部分之间的可重用程度如何?以这种方式建造房屋要贵多少?为什么以这种方式构建软件系统是有意义的?
The fact that you had to rebuild the kitchen is not disclosed. The fact that building the kitchen the second time around was much more expensive and riskier than the first time is also undisclosed. What will it take to add another bedroom to this house? How many times will you end up building and demolishing the kitchen? How many times can you actually rebuild the kitchen before it crumbles into a shifting pile of useless debris? Was the kitchen really done when you announced it so? Rework penalties aside, what degree of reuse is there between the various parts of the house? How much more expensive is building a house this way? Why would it make sense to build a software system this way?
功能或领域分解的动机是业务或客户希望尽快实现其功能。问题是您永远无法孤立地部署单个功能。Billing独立于Invoicing和没有任何商业价值Shipping。
The motivation for functional or domain decomposition is that the business or the customer wants its feature as soon as possible. The problem is that you can never deploy a single feature in isolation. There is no business value in Billing independent from Invoicing and Shipping.
如果涉及遗留系统,情况就更糟了。开发人员很少有机会开发全新的绿地系统。最有可能的是,现有系统功能设计得很糟糕,但其灵活性和维护成本却足以证明开发新系统是合理的。
The situation is even worse when legacy systems are involved. Rarely do developers get the privilege of a completely new, green-field system. Most likely there is an existing, decaying system that was designed functionally whose inflexibility and maintenance costs justify the new system.
假设您的企业在遗留系统中运行三个功能A、B和C。在构建新系统替换旧系统时,您决定先构建并(更重要的是)部署该功能,以满足希望尽早并经常看到价值的客户和经理。问题在于,企业本身A没有必要使用仅仅。企业还需要和。在新系统和旧系统中执行和将行不通,因为旧系统不了解新系统,无法仅仅执行和。同时在旧系统和新系统中执行不会增加任何价值,甚至由于重复工作而产生负价值,因此用户可能会反抗。唯一的解决方案是以某种方式协调新旧系统。协调的复杂性通常远远超过原始底层业务问题的挑战,因此开发人员最终要解决的是一个更为复杂的问题。再次使用房屋类比,如果住在狭窄的老房子里,而又按照图 2-6或图 2-7在城镇的另一边建造新房子,那会是什么感觉?假设你只在新房子里建造厨房或烹饪区,而你继续住在老房子里。每次你饿了,你都必须开车去新房子再回来。你不会接受你的房子,所以你不应该对你的顾客施加这种虐待。ABCABCBCA
Suppose your business has three functionalities A, B, and C, running in a legacy system. When building a new system to replace the old, you decide to build and, more important, deploy the A functionality first to satisfy the customers and managers who wish to see value early and often. The problem is that the business has no use for just A on its own. The business needs B and C as well. Performing A in the new system and B and C in the old system will not work, because the old system does not know about the new system and cannot execute just B and C. Doing A in both the old system and the new system adds no value and even has negative value due to the repeated work, so users are likely to revolt. The only solution is to somehow reconcile the old and the new systems. The reconciliation typically far eclipses in complexity the challenge of the original underlying business problem, so developers end up solving a far more complex problem. To use the house analogy again, what would it be like to live in a cramped old house while building a new house on the other side of town according to Figure 2-6 or Figure 2-7? Suppose you are building just cooking or the kitchen in the new house while continuing to live in the old house. Every time you are hungry, you have to drive to the new house and come back. You would not accept it with your house, so you should not inflict this kind of abuse on your customers.
功能分解和领域分解的一个关键缺陷与测试有关。在这种设计中,耦合度和复杂度非常高,开发人员唯一能做的测试就是单元测试。然而,这并不意味着单元测试很重要,它只是路灯效应1的另一个例子(即在最容易看到的地方寻找东西)。
A crucial flaw of both functional and domain decomposition has to do with testing. With such designs, the level of coupling and complexity is so high that the only kind of testing developers can do is unit testing. However, that does not make unit testing important, and it is merely another example of the streetlight effect1 (i.e., searching for something where it is easiest to look).
1. https://en.wikipedia.org/wiki/Streetlight_effect
1. https://en.wikipedia.org/wiki/Streetlight_effect
可悲的现实是,单元测试几乎毫无用处。虽然单元测试是测试的重要组成部分,但它无法真正测试系统。考虑一架拥有众多内部组件(泵、执行器、伺服器、齿轮、涡轮机等)的大型喷气式飞机。现在假设所有组件都已独立完美地通过了单元测试,但这是组件组装成飞机之前进行的唯一测试。你敢登上那架飞机吗?单元测试如此边缘化的原因是,在任何复杂系统中,缺陷不会出现在任何单元中,而是单元之间相互作用的结果。这就是为什么你本能地知道,虽然大型喷气式飞机示例中的每个组件都可以工作,但总体可能会出现严重错误。更糟糕的是,即使复杂系统处于无可挑剔的质量完美状态,更改单个经过单元测试的组件也可能会破坏依赖旧行为的其他单元。你必须更改单个单元时,重复测试所有单元。即使这样,这也毫无意义,因为对其中一个组件的更改可能会影响其他组件或子系统之间的某些交互,而单元测试无法发现这一点。验证更改的唯一方法是对系统、其子系统、其组件和交互以及最终的单元进行完整的回归测试。如果由于您的更改,其他单元需要更改,则对回归测试的影响是非线性的。单元测试的无效性并不是一个新发现,并且已在数千个经过良好测量的系统中得到证实。
The sad reality is that unit testing is borderline useless. While unit testing is an essential part of testing, it cannot really test a system. Consider a jumbo jet that has numerous internal components (pumps, actuators, servos, gears, turbines, etc.). Now suppose all components have independently passed unit testing perfectly, but that is the only testing that took place before the components were assembled into an aircraft. Would you dare board that airplane? The reason unit testing is so marginal is that in any complex system, the defects are not going to be in any of the units but rather are the result of the interactions between the units. This is why you instinctively know that, while each component in the jumbo jet example works, the aggregate could be horribly wrong. Worse, even if the complex system is at a perfect state of impeccable quality, changing a single, unit-tested component could break some other unit(s) relying on an old behavior. You must repeat testing of all units when changing a single unit. Even then it would be meaningless because the change to one of the components could affect some interaction between other components or a subsystem, which no unit testing could discover. The only way to verify change is full regression testing of the system, its subsystems, its components and interactions, and finally its units. If, as a result of your change, other units need to change, the effect on regression testing is nonlinear. The inefficacy of unit testing is not a new observation and has been demonstrated across thousands of well-measured systems.
理论上,您甚至可以在功能分解的系统上执行回归测试。实际上,这项任务的复杂性将标准定得非常高。功能组件的数量之多使得测试所有交互变得不切实际。非常大的服务内部非常复杂,以至于没有人能够有效地设计出一种全面的策略来测试通过此类服务的所有代码路径。使用功能分解,大多数开发人员都会放弃并只执行简单的单元测试。因此,通过排除回归测试,功能分解使整个系统无法测试,而无法测试的系统总是充满缺陷。
In theory, you could perform regression testing even on a functionally decomposed system. In practice, the complexity of that task would set the bar very high. The sheer number of the functional components would make testing all the interactions impractical. The very large services would be internally so complex that no one could effectively devise a comprehensive strategy that tests all code paths through such services. With functional decomposition, most developers give up and perform just simple unit testing. Therefore, by precluding regression testing, functional decomposition makes the entire system untestable, and untestable systems are always rife with defects.
不同于房屋,我们来考虑一下金融公司对股票交易系统的以下简化要求:
Instead of a house, consider the following simplified requirements for a stock trading system for a financial company:
该系统应允许内部交易员:
– 买卖股票
– 安排交易
– 发布报告
– 分析交易
The system should enable in-house traders to:
– Buy and sell stocks
– Schedule trades
– Issue reports
– Analyze the trades
系统用户利用浏览器连接到系统并管理连接的会话、填写表格并提交请求。
The users of the system utilize a browser to connect to the system and manage connected sessions, completing a form and submitting the request.
交易、报告或分析请求后,系统会向用户发送一封电子邮件,确认他们的请求或包含结果。
After a trade, report, or analysis request, the system sends an email to the users confirming their request or containing the results.
数据应存储在本地数据库中。
The data should be stored in a local database.
直接的功能分解将产生图 2-8的设计。
A straightforward functional decomposition would yield the design of Figure 2-8.
图2-8功能交易系统
Figure 2-8 Functional trading system
每个功能需求都体现在架构的相应组件中。图 2-8表示许多新手软件开发人员会毫不犹豫地采用的常见设计。
Each of the functional requirements is expressed in a respective component of the architecture. Figure 2-8 represents a common design to which many novice software developers would gravitate without hesitation.
这种系统设计有很多缺陷。当前系统中的客户端很可能是协调Buying Stocks、Selling Stocks和的客户端Trade Scheduling;发布报告Reporting;等等。假设用户希望通过出售其他股票来资助购买一定数量的股票。这意味着两个订单:先卖出,然后买入。但是,如果在发生这两笔交易时,卖出的股票价格已经下跌,或者买入的股票价格已经上涨,以至于卖出无法完成买入,客户端应该怎么做?客户端应该尽可能多地购买吗?它是否应该出售比预期更多的股票?它是否应该动用交易账户后面的现金账户来补充订单?它应该中止整个事情吗?它应该寻求用户帮助吗?确切的解决方案对于本次讨论无关紧要。无论是什么解决方案,它都需要业务逻辑,而业务逻辑现在驻留在客户端中。
The flaws of such a system design are many. It is very likely the client in the present system is the one that orchestrates Buying Stocks, Selling Stocks, and Trade Scheduling; issues a report with Reporting; and so on. Suppose the user wants to fund purchasing of a certain number of stocks by selling other stocks. This means two orders: first sell and then buy. But what should the client do if by the time these two transactions take place, the price of the stocks sold has dropped or the price of the bought stocks has risen so that the selling cannot fulfill the buying? Should the client buy just as many as possible? Should it perhaps sell more stocks than intended? Should it dip into the cash account behind the trading account to supplement the order? Should it abort the whole thing? Should it ask for user assistance? The exact resolution is immaterial for this discussion. Whatever the resolution, it requires business logic, which now resides in the client.
将客户端从 Web 门户更改为移动设备需要做些什么?这是否意味着将业务逻辑复制到移动设备中?很可能由于业务逻辑嵌入在 Web 门户中,因此在移动客户端中很少能回收和重用为 Web 客户端开发业务逻辑和投入的精力。随着时间的推移,开发人员最终会在多个客户端中维护多个版本的业务逻辑。
What will it take to change the client from a web portal to a mobile device? Would that not mean duplicating the business logic into the mobile device? It is likely that little of the business logic and the effort invested in developing it for the web client can be salvaged and reused in the mobile client because it is embedded in the web portal. Over time, the developers will end up maintaining several versions of the business logic in multiple clients.
根据要求Buying Stocks,、、、和都通过电子邮件向用户回复,列出他们的活动。如果用户更喜欢接收短信(或纸质信件)而不是电子邮件,该怎么办?您必须将、、、、Selling Stocks和活动的实施从电子邮件更改为短信。Trade SchedulingReportingAnalyzingBuying StocksSelling StocksTrade SchedulingReportingAnalyzing
Per the requirements, Buying Stocks, Selling Stocks, Trade Scheduling, Reporting, and Analyzing all respond to the user with an email listing their activities. What if the users prefer to receive a text message (or a paper letter) instead of an email? You will have to change the implementation of Buying Stocks, Selling Stocks, Trade Scheduling, Reporting, and Analyzing activities from an email to a text message.
根据设计决策,数据存储在数据库中,并且Buying Stocks、Selling Stocks、Trade Scheduling、Reporting和Analyzing都访问该数据库。现在假设您决定将数据存储从本地数据库移至基于云的解决方案。至少,这将迫使您更改Buying Stocks、Selling Stocks、Trade Scheduling、中的数据访问代码Reporting,并Analyzing从本地数据库移至云产品。您构建、访问和使用数据的方式必须在所有组件中发生变化。
Per the design decision, the data is stored in a database, and Buying Stocks, Selling Stocks, Trade Scheduling, Reporting, and Analyzing all access that database. Now suppose you decide to move the data storage from the local database to a cloud-based solution. At the very least, this will force you to change the data-access code in Buying Stocks, Selling Stocks, Trade Scheduling, Reporting, and Analyzing to go from a local database to a cloud offering. The way you structure, access, and consume the data has to change across all components.
如果客户端希望与系统异步交互,进行几笔交易,稍后再收集结果,该怎么办?您使用连接的同步客户端的概念构建了组件,该客户端负责协调组件。您可能需要重写、、、和Buying Stocks活动以相互协调,如图 2-5所示。Selling StocksTrade SchedulingReportingAnalyzing
What if the client wishes to interact with the system asynchronously, issuing a few trades and collecting the results later? You built the components with the notion of a connected, synchronous client that orchestrates the components. You will likely need to rewrite Buying Stocks, Selling Stocks, Trade Scheduling, Reporting, and Analyzing activities to orchestrate each other, along the lines of Figure 2-5.
通常,金融投资组合由股票以外的多种金融工具组成,例如货币、债券、商品,甚至这些工具的期权和期货。如果系统用户希望开始交易货币或商品而不是股票,该怎么办?如果用户要求使用单个应用程序而不是多个应用程序来管理他们的所有投资组合,该怎么办?Buying Stocks、Selling Stocks和Trade Scheduling都是关于股票的,不能处理货币或债券,需要您添加其他组件(如图 2-6所示)。同样,Reporting和Analyzing需要进行大规模重写才能适应股票以外交易的报告和分析。客户端需要重写以适应新的交易项目。
Often, financial portfolios are comprised of multiple financial instruments besides stocks, such as currencies, bonds, commodities, and even options and futures on those instruments. What if the users of the system wish to start trading currencies or commodities instead of stocks? What if the users demand a single application, rather than several applications, to manage all of their portfolios? Buying Stocks, Selling Stocks, and Trade Scheduling are all about stocks and cannot handle currencies or bonds, requiring you to add additional components (like Figure 2-6). Similarly, Reporting and Analyzing need a major rewrite to accommodate reporting and analysis of trades other than stocks. The client needs a rewrite to accommodate the new trade items.
即使没有扩展到商品,如果你必须将应用程序本地化到国外市场怎么办?至少,客户需要进行彻底的改造才能适应语言本地化,但真正的影响又将再次出现在系统组件上。国外市场将有不同的交易规则、法规和合规要求,这将极大地影响系统被允许做什么以及它如何进行交易。这意味着每当进入一个新地区时,都需要对Buying Stocks、Selling Stocks、Trade Scheduling、Reporting和 进行大量返工Analyzing。您最终要么得到可以在任何市场进行交易的臃肿的神服务,要么得到适用于每个部署地区系统的系统版本。
Even without branching to commodities, what if you must localize the application to foreign markets? At the very least, the client will need a serious makeover to accommodate language localization, but the real effect is going to be the system components again. Foreign markets are going to have different trading rules, regulations, and compliance requirements, drastically affecting what the system is allowed to do and how it is to go about trading. This will mean much rework to Buying Stocks, Selling Stocks, Trade Scheduling, Reporting, and Analyzing whenever entering a new locale. You are going to end up with either bloated god services that can trade in any market or a version of the system for each deployment locale.
最后,目前所有组件都连接到某个股票行情提要,以便为其提供最新的股票价格。要切换到新的提要提供商或整合多个提要,需要做些什么?至少,、、、Buying Stocks和需要做一些工作来转移到新的提要、连接到它、处理其错误、支付其服务Selling Stocks费用等等。也不能保证新提要使用与旧提要相同的数据格式。所有组件也都需要进行一些转换和变换工作。Trade SchedulingReportingAnalyzing
Finally, all components presently connect to some stock ticker feed that provides them with the latest stock values. What is required to switch to a new feed provider or to incorporate multiple feeds? At the very least, Buying Stocks, Selling Stocks, Trade Scheduling, Reporting, and Analyzing will require work to move to a new feed, connect to it, handle its errors, pay for its service, and so on. There are also no guarantees that the new feed uses the same data format as the old one. All components require some conversion and transformation work as well.
该方法的设计指令是:
The Method’s design directive is:
根据波动性分解。
Decompose based on volatility.
基于波动性的分解可识别潜在变化的领域,并将其封装到服务或系统构建块中。然后,您可以将所需的行为实现为封装的波动性区域之间的交互。
Volatility-based decomposition identifies areas of potential change and encapsulates those into services or system building blocks. You then implement the required behavior as the interaction between the encapsulated areas of volatility.
基于波动性的分解的动机本身就很简单:任何变化都被封装起来,包含对系统的影响。
The motivation for volatility-based decomposition is simplicity itself: any change is encapsulated, containing the effect on the system.
当您使用基于波动率的分解时,您开始将系统视为一系列保险库,如图2-9所示。
When you use volatility-based decomposition, you start thinking of your system as a series of vaults, as in Figure 2-9.
图 2-9波动性的封闭区域(图片:media500/Shutterstock;pikepicture/Shutterstock)
Figure 2-9 Encapsulated areas of volatility (Images: media500/Shutterstock; pikepicture/Shutterstock)
任何变化都可能非常危险,就像拔掉引信的手榴弹一样。然而,通过基于波动性的分解,你可以打开相应保险库的门,将手榴弹扔进去,然后关上门。保险库里的任何东西都可能被彻底摧毁,但没有弹片四处乱飞,摧毁其路径上的一切。你已经遏制了变化。
Any change is potentially very dangerous, like a hand grenade with the pin pulled out. Yet, with volatility-based decomposition, you open the door of the appropriate vault, toss the grenade inside, and close the door. Whatever was inside the vault may be destroyed completely, but there is no shrapnel flying everywhere, destroying everything in its path. You have contained the change.
通过功能分解,您的构建块代表功能区域,而不是波动性。因此,当发生变化时,按照定义分解会影响架构中的多个(如果不是大多数)组件。因此,功能分解倾向于最大化变更的效果。由于大多数软件系统都是功能设计的,因此变更通常是痛苦且昂贵的,并且系统可能会对变更产生共鸣。在一个功能领域所做的更改会触发其他更改,依此类推。适应变化是您必须避免功能分解的真正原因。
With functional decomposition, your building blocks represent areas of functionality, not volatility. As a result, when a change happens, by the very definition of the decomposition, it affects multiple (if not most) of the components in your architecture. Functional decomposition therefore tends to maximize the effect of the change. Since most software systems are designed functionally, change is often painful and expensive, and the system is likely to resonate with the change. Changes made in one area of functionality trigger other changes and so on. Accommodating change is the real reason you must avoid functional decomposition.
功能分解的所有其他问题与处理变更的能力差和成本高相比都显得微不足道。在功能分解的情况下,变更就像吞下一颗手榴弹。
All the other problems with functional decomposition pale when compared with the poor ability and high cost of handling change. With functional decomposition, a change is like swallowing a live hand grenade.
您选择封装的内容可以是功能性的,但几乎从来都不是领域性的,这意味着它对业务毫无意义。例如,为房屋供电的电力确实是一个功能领域,但也是一个需要封装的重要领域,原因有二。第一个原因是房屋中的电力非常不稳定:电力可能是交流电或直流电;110 伏或 220 伏;单相或三相;50 赫兹或 60 赫兹;由屋顶上的太阳能电池板、后院的发电机或普通电网连接产生;通过不同规格的电线传输;等等。所有这些波动性都封装在一个插座后面。当需要用电时,用户看到的只是一个不透明的插座,封装了电力波动性。这将耗电设备与电力波动性分离,提高了重用性、安全性和可扩展性,同时降低了整体复杂性。它使得一所房子里的用电与另一所房子里的用电没有区别,突出了将电力视为可以封装在房子里的东西的第二个理由。虽然给房子供电是一个功能性,一般来说,权力的使用并不特定于房屋领域(住在房子里的家庭、他们的关系、他们的福祉、财产等)。
What you choose to encapsulate can be functional in nature, but hardly ever is it domain-functional, meaning it has no meaning for the business. For example, the electricity that powers a house is indeed an area of functionality but is also an important area to encapsulate for two reasons. The first reason is that power in a house is highly volatile: power can be AC or DC; 110 volts or 220 volts; single phase or three phases; 50 hertz or 60 hertz; produced by solar panels on the roof, a generator in the backyard, or plain grid connectivity; delivered on wires with different gauges; and on and on. All that volatility is encapsulated behind a receptacle. When it is time to consume power, all the user sees is an opaque receptacle, encapsulating the power volatility. This decouples the power-consuming appliances from the power volatility, increasing reuse, safety, and extensibility while reducing overall complexity. It makes using power in one house indistinguishable from using it in another, highlighting the second reason it is valid to identify power as something to encapsulate in the house. While powering a house is an area of functionality, in general, the use of power is not specific to the domain of the house (the family living in the house, their relationships, their wellbeing, property, etc.).
住在没有封装电源波动的房子里会是什么感觉?每当你想要用电时,你都必须先把电线暴露出来,用示波器测量频率,然后用电压表确认电压。虽然你可以用这种方式使用电源,但依靠插座后面的波动封装要容易得多,这样你就可以通过把电源集成到你的任务或日常工作中来增加价值。
What would it be like to live in a house where the power volatility was not encapsulated? Whenever you wanted to consume power, you would have to first expose the wires, measure the frequency with an oscilloscope, and certify the voltage with a voltmeter. While you could use power that way, it is far easier to rely on the encapsulation of that volatility behind the receptacle, allowing you instead to add value by integrating power into your tasks or routine.
如前所述,功能分解大大增加了系统的复杂性。功能分解也使维护成为一场噩梦。此类系统中的代码不仅复杂,而且更改会分散在多个服务中。这使得维护代码需要大量劳动力、容易出错且非常耗时。通常,代码越复杂,其质量就越低,而低质量会使维护更具挑战性。您必须应对高复杂性,避免在解决旧缺陷的同时引入新缺陷。在功能分解的系统中,由于低质量和复杂性的结合,新更改导致新缺陷的情况很常见。扩展功能系统通常需要付出的努力与客户的利益不成比例。
As explained previously, functional decomposition drastically increases the system’s complexity. Functional decomposition also makes maintenance a nightmare. Not only is the code in such systems complex, changes are spread across multiple services. This makes maintaining the code labor intensive, error prone, and very time-consuming. Generally, the more complex the code, the lower its quality, and low quality makes maintenance even more challenging. You must contend with high complexity and avoid introducing new defects while resolving old ones. In a functionally decomposed system, it is common for new changes to result in new defects due to the confluence of low quality and complexity. Extending the functional system often requires effort disproportionally expensive with respect to the benefit to the customer.
甚至在维护开始之前,当系统处于开发阶段时,功能分解就存在危险。需求会在整个开发过程中发生变化(这是不可避免的),每次变化的成本都很大,影响多个领域,迫使大量返工,最终危及最后期限。
Even before maintenance ever starts, when the system is under development, functional decomposition harbors danger. Requirements will change throughout development (as they invariably do), and the cost of each change is huge, affecting multiple areas, forcing considerable rework, and ultimately endangering the deadline.
采用基于波动性的分解设计的系统在响应变化的能力方面呈现出鲜明的对比。由于每个模块都包含变化,因此至少有希望实现易于维护且不会对模块边界外产生副作用。复杂性降低且维护更简单,质量大大提高。如果某些东西以相同的方式封装在另一个系统中,您就有机会重用。您可以通过添加更多封装波动性区域来扩展系统,或者以不同的方式集成现有的波动性区域。封装波动性意味着在开发过程中对功能蔓延的弹性要好得多,并且有机会满足进度,因为变化将得到控制。
Systems designed with volatility-based decomposition present a stark contrast in their ability to respond to change. Since changes are contained in each module, there is at least a hope for easy maintenance with no side effects outside the module boundary. With lower complexity and easier maintenance, quality is much improved. You have a chance at reuse if something is encapsulated the same way in another system. You can extend the system by adding more areas of encapsulated volatility or integrate existing areas of volatility in a different way. Encapsulating volatility means far better resiliency to feature creep during development and a chance of meeting the schedule, since changes will be contained.
基于波动性的分解的优点并不局限于软件系统。它们是良好设计的通用原则,从商业到业务交互,从生物学到物理系统和优秀软件。通用原则本质上也适用于软件(否则它们就不是通用的)。例如,考虑一下你自己的身体。你自己身体的功能分解将包含你需要完成的每一项任务的组件,从驾驶到编程再到演示,但你的身体没有任何这样的组件。你通过整合波动性领域来完成编程等任务。例如,你的心脏为你的系统提供一项重要服务:泵血。泵血具有巨大的波动性:高血压和低压、盐度、粘度、脉搏率、活动水平(坐着或跑步)、有无肾上腺素、不同的血型、健康和生病等等。然而,所有这些波动性都封装在称为心脏的服务背后。如果你必须关心泵血所涉及的波动性,你还能编程吗?
The merits of volatility-based decomposition are not specific to software systems. They are universal principles of good design, from commerce to business interactions to biology to physical systems and great software. Universal principles, by their very nature, apply to software too (else they would not be universal). For example, consider your own body. A functional decomposition of your own body would have components for every task you are required to do, from driving to programming to presenting, yet your body does not have any such components. You accomplish a task such as programming by integrating areas of volatility. For example, your heart provides an important service for your system: pumping blood. Pumping blood has enormous volatility to it: high blood pressure and low pressure, salinity, viscosity, pulse rate, activity level (sitting or running), with and without adrenaline, different blood types, healthy and sick, and so on. Yet all that volatility is encapsulated behind the service called the heart. Would you be able to program if you had to care about the volatility involved in pumping blood?
您还可以将封装的波动性外部区域集成到您的实现中。考虑一下您的计算机,它与世界上任何其他计算机都不同,但所有波动性都被封装了。只要计算机可以向屏幕发送信号,您就不必关心图形端口后面发生了什么。您可以通过集成封装的波动性区域(一些内部区域,一些外部区域)来执行编程任务。您可以在执行其他功能(例如驾驶汽车或向客户展示您的工作)时重用相同的波动性区域(例如心脏)。根本没有其他方法来设计和构建可行的系统。
You can also integrate into your implementation external areas of encapsulated volatility. Consider your computer, which is different from literally any other computer in the world, yet all that volatility is encapsulated. As long as the computer can send a signal to the screen, you do not care what happens behind the graphic port. You perform the task of programming by integrating encapsulated areas of volatility, some internal, some external. You can reuse the same areas of volatility (such as the heart) while performing other functionalities such as driving a car or presenting your work to customers. There is simply no other way of designing and building a viable system.
根据波动性进行分解是系统设计的精髓。所有设计良好的系统,无论是软件系统还是物理系统,都将其波动性封装在系统的构建块中。
Decomposing based on volatility is the essence of system design. All well-designed systems, software and physical systems alike, encapsulate their volatility inside the system’s building blocks.
基于波动性的分解非常适合回归测试。组件数量的减少、组件大小的减小以及组件之间交互的简化都大大降低了系统的复杂性。这使得编写回归测试成为可能,该测试可以端到端地测试系统、单独测试每个子系统并最终测试独立的组件。由于基于波动性的分解包含系统构建块内部的变化,因此一旦不可避免的变化发生,它们不会破坏现有的回归测试。您可以独立于系统的其余部分测试组件更改的影响,而不会干扰组件间和子系统间的测试。
Volatility-based decomposition lends well to regression testing. The reduction in the number of components, the reduction in the size of components, and the simplification of the interactions between components all drastically reduce the complexity of the system. This makes it feasible to write regression testing that tests the system end to end, tests each subsystem individually, and eventually tests independent components. Since volatility-based decomposition contains the changes inside the building blocks of the system, once the inevitable changes do happen, they do not disrupt the regression testing in place. You can test the effect of a change in a component in isolation from the rest of the system without interfering with the inter-components and inter-subsystems testing.
基于波动性的分解背后的想法和动机简单、实用,并且符合现实和常识。执行基于波动性的分解的主要挑战与时间、沟通和感知有关。您会发现波动性通常不是不言而喻的。在项目开始时,没有哪个客户或产品经理会以以下方式向您展示系统需求:“这可能会改变,我们稍后会改变那个,我们永远不会改变那些。” 外界(无论是客户、管理层还是营销部门)总是向您展示功能方面的要求:“系统应该做这个和那个。” 即使是您,在阅读这些页面时,也可能在您尝试识别当前系统中的波动区域时,很难理解这个概念。因此,与功能分解相比,基于波动的分解需要更长的时间。
The ideas and motivations behind volatility-based decomposition are simple, practical, and consistent with reality and common sense. The main challenges in performing a volatility-based decomposition have to do with time, communication, and perception. You will find that volatility is often not self-evident. No customer or product manager at the onset of a project will ever present you the requirements for the system the following way: “This could change, we will change that one later, and we will never change those.” The outside world (be it customers, management, or marketing) always presents you with requirements in terms of functionality: “The system should do this and that.” Even you, reading these pages, are likely struggling to wrap your head around this concept as you to try to identify the areas of volatility in your current system. Consequently, volatility-based decomposition takes longer compared with functional decomposition.
请注意,基于波动性的分解并不意味着您应该忽略需求。您必须分析需求以识别波动性区域。可以说,需求分析的整个目的就是识别波动性区域,而这种分析需要付出努力和汗水。这实际上是个好消息,因为现在您有机会遵守热力学第一定律。遗憾的是,仅仅为问题付出努力并不意味着什么。热力学第一定律并没有说如果你为某事付出努力,你就会创造价值。创造价值要困难得多。本书为您提供了设计和分析的强大思维工具,包括结构、指南和合理的工程方法。这些工具让您在追求创造价值的过程中有机会一搏。您仍然必须练习和奋斗。
Note that volatility-based decomposition does not mean you should ignore the requirements. You must analyze the requirements to recognize the areas of volatility. Arguably, the whole purpose of requirements analysis is to identify the areas of volatility, and this analysis requires effort and sweat. This is actually great news because now you are given a chance to comply with the first law of thermodynamics. Sadly, merely sweating on the problem does not mean a thing. The first law of thermodynamics does not state that if you sweat on something, you will add value. Adding value is much more difficult. This book provides you with powerful mental tools for design and analysis, including structure, guidelines, and a sound engineering methodology. These tools give you a fighting chance in your quest to add value. You still must practice and fight.
对于每个知识密集型学科,要想精通和有效都需要时间,要想出类拔萃则需要更多时间。厨房管道、内科和软件架构等各个领域都是如此。在生活中,您经常选择不追求某些专业领域,因为掌握它们所需的时间和成本将远远超过聘请专家所需的时间和成本。例如,排除任何慢性健康问题,工作年龄的人每年生病的时间约为一周。每年因病停工一周的时间大约占工作时间的 2%。那么,当您生病时,您会打开医学书籍开始阅读,还是去看医生?这只占您时间的 2%,频率足够低(专业门槛足够高),除了去看医生之外,做任何事情都没有意义。成为一名优秀的医生并不值得您花费时间。但是,如果你 80% 的时间都在生病,你可能会花费相当多的时间来了解自己的病情、可能的并发症、治疗方法和选择,甚至经常与医生争吵。你对解剖学和医学的天生倾向并没有改变;只是你的投入程度发生了变化(希望你永远不必真正精通医学)。
With every knowledge-intensive subject, it takes time to become proficient and effective and even more to excel at it. This is true in areas as varied as kitchen plumbing, internal medicine, and software architecture. In life, you often choose not to pursue certain areas of expertise because the time and cost required to master them would dwarf the time and cost required to utilize an expert. For example, precluding any chronic health problem, a working-age person is sick for about a week a year. A week a year of downtime due to illness is roughly 2% of the working year. So, when you are sick, do you open up medicine books and start reading, or do you go and see a doctor? At only 2% of your time, the frequency is low enough (and the specialty bar high enough) that there is little sense in doing anything other than going to the doctor. It is not worth your while to become as good as a doctor. If, however, you were sick 80% of the time, you might spend a considerable portion of your time educating yourself about your condition, possible complications, treatments, and options, often to the point of sparring with your doctor. Your innate propensity for anatomy and medicine has not changed; only your degree of investment has (hopefully, you will never have to be really good at medicine).
同样,当你的厨房水槽在垃圾处理器和洗碗机后面的某个地方堵塞时,你是去五金店,购买 P 型存水弯、S 型存水弯、各种适配器、三种不同类型的扳手、各种 O 型圈和其他配件,还是打电话给水管工?这又是 2% 的问题:如果水槽堵塞的时间不到 2%,那么学习如何修理水槽是不值得的。寓意在于,当你在任何复杂任务上花费 2% 的时间时,你永远不会擅长它。
Similarly, when your kitchen sink is clogged somewhere behind the garbage disposal and the dishwasher, do you go to the hardware store, purchase a P-trap, an S-trap, various adapters, three different types of wrenches, various O-rings and other accessories, or do you call a plumber? It is the 2% problem again: it is not worth your while learning how to fix that sink if it is clogged less than 2% of the time. The moral is that when you spend 2% of your time on any complex task, you will never be any good at it.
对于软件系统架构,架构师只能在周期的重大变革中将整个系统分解为模块。平均而言,每隔几年就会发生一次此类事件。在从头开始之间的过渡期内,所有其他设计充其量只能起到增量作用,最坏的情况则是对现有系统有害。经理允许架构师在下一个项目的架构上投入多少时间?一周?两周?三周?六周?确切的答案并不重要。一方面,周期以年为单位,另一方面,活动以周为单位。周与年的比例大约为 1:50,或再次为 2%。架构师从惨痛经历中吸取了教训,他们需要磨练技能,为那 2% 的时间做好准备。现在考虑架构师的经理。如果架构师花费 2% 的时间设计系统,那么该架构师的经理花费多少时间管理该架构师?答案可能是那段时间的一小部分。因此,经理永远不会擅长在那个关键阶段管理架构师。经理总是会惊呼:“我不明白为什么这要花这么长时间!为什么我们不能直接做A…… ”BC
With software system architecture, architects get to decompose a complete system into modules only on major revolutions of the cycle. Such events happen, on average, every few years. All other designs in the interim between clean slates are at best incremental and at worse detrimental to the existing systems. How much time will the manager allow the architect to invest in architecture for the next project? One week? Two weeks? Three weeks?? Six weeks??? The exact answer is irrelevant. On one hand, you have cycles measured in years and, on the other, activities measured in weeks. The week-to-year ratio is roughly 1:50, or 2% again. Architects have learned the hard way that they need to hone their skills getting ready for that 2% window. Now consider the architect’s manager. If the architect spends 2% of the time architecting the system, what percentage of the time does that architect’s manager spend managing said architect? The answer is probably a small fraction of that time. Therefore, the manager is never going to be good at managing architects at that critical phase. The manager is constantly going to exclaim, “I don’t understand why this is taking so long! Why can’t we just do A, B, C?”
争取时间正确地进行分解可能与分解本身一样具有挑战性,甚至更具有挑战性。但是,任务的难度不应妨碍其完成。正因为它很难,所以必须完成。您将在本书后面看到几种争取时间的技巧。
Gaining the time to do decomposition correctly will likely be as much of a challenge as doing the decomposition, if not more so. However, the difficulty of a task should not preclude it from being done. Precisely because it is difficult, it must be done. You will see later on in this book several techniques for gaining the time.
1999 年,大卫·邓宁 (David Dunning) 和贾斯汀·克鲁格 (Justin Kruger) 发表了他们的研究2,结论性地表明,不熟练某个领域的人往往会看不起它,认为它没有实际那么复杂、风险那么大或要求那么高。这种认知偏见与其他领域的智力或专业知识无关。如果你不熟练某个领域,你永远不会认为它比实际更复杂,你会认为它更简单!
In 1999, David Dunning and Justin Kruger published their research2 demonstrating conclusively that people unskilled in a domain tend to look down on it, thinking it is less complex, risky, or demanding than it truly is. This cognitive bias has nothing to do with intelligence or expertise in other domains. If you are unskilled in something, you never assume it is more complex than it is, you assume it is less!
2.贾斯汀·克鲁格和戴维·邓宁,《缺乏技能且没有意识到:难以认识到自己的无能如何导致自我评价过高》,《人格与社会心理学杂志》第 77 卷,第 6 期(1999 年):第 1121–1134 页。
2. Justin Kruger and David Dunning, “Unskilled and Unaware of It: How Difficulties in Recognizing One’s Own Incompetence Lead to Inflated Self-Assessments,” Journal of Personality and Social Psychology 77, no. 6 (1999): 1121–1134.
当经理举起双手说“我不明白为什么这花了这么长时间”时,经理真的不明白为什么你不能先做A,然后B,然后C。不要生气。你应该预料到这种行为,并通过教育你的经理和同事来正确解决它,因为他们自己也承认他们不理解。
When the manager is throwing hands in the air saying, “I don't understand why this is taking so long,” the manager really does not understand why you cannot just do the A, then B, and then C. Do not be upset. You should expect this behavior and resolve it correctly by educating your manager and peers who, by their own admission, do not understand.
爱因斯坦曾说过,以同样的方式做事却期望得到更好的结果,这简直就是疯狂。由于经理通常希望你比上次做得更好,你必须再次指出追求功能分解的疯狂,并解释基于波动性的分解的优点。最后,即使你无法说服一个人,你也不应该简单地服从命令,让项目早早走向死亡。你仍然必须根据波动性进行分解。你的职业操守(以及最终的理智和长期的内心平静)岌岌可危。
Albert Einstein is attributed with saying that doing things the same way but expecting better results is the definition of insanity. Since the manager typically expects you to do better than last time, you must point out the insanity of pursuing functional decomposition yet again and explain the merits of volatility-based decomposition. In the end, even if you fail to convince a single person, you should not simply follow orders and dig the project into an early grave. You must still decompose based on volatility. Your professional integrity (and ultimately your sanity and long-term peace of mind) is at stake.
本章的其余部分将为您提供一套工具,供您在寻找和识别波动区域时使用。虽然这些技术本身很有价值且有效,但它们有些松散。下一章将介绍允许更快且可重复地识别波动区域的结构和约束。然而,该讨论只是对本节中的想法进行了微调和专门化。
The rest of this chapter provides you with a set of tools to use when you go searching for and identifying areas of volatility. While these techniques are valuable and effective in their own right, they are somewhat loose. The next chapter introduces structure and constraints that allow for quicker and repeatable identification of areas of volatility. However, that discussion merely fine-tunes and specializes the ideas in this section.
许多新手所困惑的一个关键问题是变化的事物和易变的事物之间的区别。并非所有可变的事物都是易变的。只有当易变性是开放式的,并且除非将其封装在架构的组件中,否则包含它的成本将非常高时,您才可以在系统设计级别封装它。另一方面,可变性描述的是那些您可以使用条件逻辑在代码中轻松处理的方面。在寻找易变性时,您应该留意会对整个系统产生连锁反应的那种变化或风险。变化不得使架构失效。
A key question many novices struggle with is the difference between things that change and things that are volatile. Not everything that is variable is also volatile. You resort to encapsulating a volatility at the system design level only when it is open-ended and, unless encapsulated in a component of the architecture, would be very expensive to contain. Variability, on the other hand, describes those aspects that you can easily handle in your code using conditional logic. When searching for volatility, you should be on the lookout for the kind of changes or risks that would have ripple effects across the system. Changes must not invalidate the architecture.
寻找波动领域是一个发现过程,发生在需求分析和与项目利益相关者的访谈过程中。
Finding areas of volatility is a process of discovery that takes place during requirements analysis and interviews with the project stakeholders.
有一种简单的技术,我称之为波动轴。该技术检查客户使用系统的方式。在这种情况下,客户是指系统的消费者,可以是单个用户或整个其他商业实体。
There is a simple technique I call axes of volatility. This technique examines the ways the system is used by customers. Customer in this context refers to a consumer of the system, which could be a single user or a whole other business entity.
在任何业务中,系统面临变化的方式只有两种:第一个轴心是随着时间的推移,同一个客户。即使目前系统完全符合特定客户的需求,但随着时间的推移,该客户的业务环境也会发生变化。甚至客户对系统的使用也常常会改变最初编写系统时所针对的要求。3随着时间的推移,客户对系统的要求和期望也会发生变化。
In any business, there are only two ways your system could face change: the first axis is at the same customer over time. Even if presently the system is perfectly aligned with a particular customer’s needs, over time, that customer’s business context will change. Even the use of the system by the customer will often change the requirements against which it was written in the first place.3 Over time, the customer’s requirements and expectation of the system will change.
3.解决方案往往会改变其开发要求,这一趋势最早是由 19 世纪英国经济学家威廉·杰文斯 (William Jevons) 在研究煤炭生产时发现的,此后被称为杰文斯悖论。其他表现形式包括随着数字办公的兴起,纸张消耗量增加,以及随着道路容量增加,交通拥堵情况恶化。
3. The tendency of a solution to change the requirements against which it was developed was first observed by the 19th-century English economist William Jevons with regard to coal production, and it is referred to since as the Jevons paradox. Other manifestations are the increase in paper consumption with the digital office and the worsening traffic congestion following an increase in road capacity.
第二种变化是同时发生在客户身上。如果你能暂停时间并检查你的客户群,你现在所有的客户是否都以完全相同的方式使用系统?他们中有些人的做法与其他人有什么不同?你必须适应这些差异吗?所有这些变化定义了波动的第二个轴。
The second way change could come is at the same time across customers. If you could freeze time and examine your customer base, are all your customers now using the system in exactly the same way? What are some of them doing that is different from the others? Do you have to accommodate such differences? All such changes define the second axis of volatility.
在访谈中寻找潜在波动性时,您会发现用波动性轴(同一客户随时间的变化、所有客户在同一时间点的变化)来表述问题非常有帮助。以这种方式提出问题有助于您识别波动性。如果某些东西没有映射到波动性轴,您根本不应该封装它,并且您的系统中不应该有它映射到的构建块。创建这样的块可能表明功能分解。
When searching for potential volatility in interviews, you will find it very helpful to phrase the questions in terms of the axes of volatility (same customer over time, all customers at the same point in time). Framing the questions in this way helps you identify the volatilities. If something does not map to the axes of volatility, you should not encapsulate it at all, and there should be no building block in your system to which it is mapped. Creating such a block would likely indicate functional decomposition.
通常,使用波动轴寻找波动区域的行为是一个迭代过程,与设计本身的分解交织在一起。例如,请考虑图 2-10中的设计迭代进展。
Often, the act of looking for areas of volatility using the axes of volatility is an iterative process interleaved with the factoring of the design itself. Consider, for example, the progression of design iterations in Figure 2-10.
图片 2-10沿波动率轴的设计迭代
Figure 2-10 Design iterations along axes of volatility
您对所提议架构的第一印象可能看起来像图 A — 一个大组件,一个单一组件。问问自己,您能永远对某个特定客户使用相同的组件吗?如果答案是否定的,那么为什么?通常,这是因为您知道客户会随着时间的推移想要改变某个特定的东西。在这种情况下,您必须封装那个东西,产生图 B。现在问自己,您现在可以对所有客户使用图 B 吗?如果答案是否定的,那么找出客户想要做不同的事情,封装它,并生成图 C。您继续以这种方式分解设计,直到将波动轴上的所有可能点都封装起来。
Your first take of the proposed architecture might look like diagram A—one big thing, one single component. Ask yourself, Could you use the same component, as is, with a particular customer, forever? If the answer is no, then why? Often, it is because you know that customer will, over time, want to change a specific thing. In that case, you must encapsulate that thing, yielding diagram B. Ask yourself now, Could you use diagram B across all customers now? If the answer is no, then identify the thing that the customers want to do differently, encapsulate it, and produce diagram C. You keep factoring the design that way until all possible points on the axes of volatility are encapsulated.
轴几乎总是独立的。随着时间推移,某个客户的变化不应该在同一时间点对所有客户产生同样大的变化,反之亦然。如果变化区域不能被隔离到某个轴上,通常表明存在隐蔽的功能分解。
Almost always, the axes should be independent. Something that changes for one customer over time should not change as much across all customers at the same point in time, and vice versa. If areas of change cannot be isolated to one of the axes, it often indicates a functional decomposition in disguise.
您可以使用波动轴来概括房屋的波动性。首先看看您自己的房子,观察它随着时间的推移是如何变化的。例如,考虑家具。随着时间的推移,您可能会重新布置客厅里的家具,偶尔添加新家具或更换旧家具。结论是,房屋中的家具是不稳定的。接下来考虑电器。随着时间的推移,您可能会改用节能电器。您可能已经用平板等离子屏幕替换了旧的 CRT,并用超薄的大型 OLED 电视替换了旧的 CRT。这强烈表明您家中的电器是不稳定的。房子里的居住者怎么样?这方面是静态的吗?您曾经有客人来过吗?房子里可以没有人吗?房子里的居住者是变化无常的。外观怎么样?您是否曾经粉刷过房子、更换过窗帘或美化过环境?房子的外观是不稳定的。房子可能连接到一些公用设施,从互联网到电力和安全。之前,我指出了房屋中的电力波动性,但互联网呢?在过去的几年里,您可能使用拨号上网,然后转向 DSL,然后是电缆,现在是光纤或卫星连接。虽然这些选项截然不同,但您不会想根据连接类型改变发送电子邮件的方式。您应该将所有公用事业的波动性封装起来。图 2-11显示了沿第一个波动轴(同一客户随时间变化)的这种可能分解。
You can use the axes of volatility to encapsulate the volatility of a house. Start by looking at your own house and observe how it changes over time. For example, consider furniture. Over time, you may rearrange the furniture in the living room and occasionally add new pieces or replace old ones. The conclusion is that furniture in a house is volatile. Next consider appliances. Over time, you may switch to energy-efficient appliances. You likely have already replaced the old CRT with flat plasma screen and that with a large, wafer-thin OLED TV. This is a strong indication that at your house, appliances are volatile. How about the occupants of the house? Is that aspect static? Do you ever have guests come over? Can the house be empty of people? The occupants of the house are volatile. What about appearance? Do you ever paint the house, change the draperies or landscaping? The appearance of a house is volatile. The house is likely connected to some utilities, from Internet to power and security. Previously, I pointed out the power volatility in a house, but what about Internet? In years past, you may have used dial-up for Internet, then moved to DSL, then cable, and now fiber optics or a satellite connection. While these options are drastically different, you would not want to change the way you send emails based on the type of connectivity. You should encapsulate the volatilities of all utilities. Figure 2-11 shows this possible decomposition along the first axis of volatility (same customer over time).
图 2-11同一栋房屋随时间的变化
Figure 2-11 Same house over time
现在,即使在同一时间点,你的房子是否与其他所有房子都一样?其他房子的结构不同,因此房子的结构是不稳定的。即使你将你的房子复制粘贴到另一个城市,它还是同一所房子吗?4答案显然是否定的。这所房子将有不同的邻居,并受不同城市法规、建筑规范和税收的约束。图 2-12显示了沿第二条波动轴(同一时间点的不同客户)的这种可能分解。
Now, even at the same point in time, is your house the same as every other house? Other houses have a different structure, so the structure of the house is volatile. Even if you were to copy and paste your house to another city, would it be the same house?4 The answer is clearly negative. The house will have different neighbors and be subjected to different city regulations, building codes, and taxes. Figure 2-12 shows this possible decomposition along the second axis of volatility (different customers at the same point in time).
图2-12同时跨房屋
Figure 2-12 At the same time across houses
4.古希腊人在忒修斯悖论中努力解答了这个问题(https://en.wikipedia.org/wiki/Ship_of_Theseus)。
4. The ancient Greeks grappled with this question in Theseus’s paradox (https://en.wikipedia.org/wiki/Ship_of_Theseus).
请注意轴的独立性。随着时间的推移,您所居住的城市确实会改变其法规,但变化的速度很慢。同样,只要您住在同一所房子里,新邻居出现的可能性就相当低,但如果您在同一时间点将您的房子与另一所房子进行比较,则出现新邻居的可能性是肯定的。因此,将波动性分配给其中一个轴并不是绝对排除,而更像是不成比例的概率。
Note the independence of the axis. The city where you live over time does change its regulations, but the changes come at a slow pace. Similarly, the likelihood of new neighbors is fairly low as long as you live in the same house but is a certainty if you compare your house to another at the same point in time. The assignment of a volatility to one of the axes is therefore not an absolute exclusion but more one of disproportional probability.
还要注意的是,该Neighbors Volatility组件可以处理同一房屋内邻居随时间变化的波动,就像它可以在同一时间点处理不同房屋的波动一样容易。将组件分配给轴有助于首先发现波动性;波动性在同一时间点的不同房屋之间更加明显。
Note also that the Neighbors Volatility component can deal with volatility of neighbors at the same house over time as easily as it can do that across different houses at the same point in time. Assigning the component to an axis helps to discover the volatility in the first place; the volatility is just more apparent across different houses at the same point in time.
最后,与图 2-6和图 2-7的分解形成鲜明对比的是,图 2-11和图 2-12中的分解中没有烹饪或厨房的组件。在基于波动性的分解中,所需的行为是通过各种封装的波动性区域之间的相互作用来实现的。烹饪晚餐可能是居住者、电器、结构和公用设施之间相互作用的产物。由于仍然需要某种东西来管理这种相互作用,因此设计并不完整。波动性轴是一个很好的起点,但它并不是解决问题的唯一工具。
Finally, in sharp contrast to the decompositions of Figure 2-6 and Figure 2-7, in Figure 2-11 and Figure 2-12 there is no component in the decomposition for cooking or kitchen. In a volatility-based decomposition, the required behavior is accomplished by an interaction between the various encapsulated areas of volatility. Cooking dinner may be the product of an interaction between the occupants, the appliances, the structure, and the utilities. Since something still needs to manage that interaction, the design is not complete. The axes of volatility are a great starting point, but it is not the only tool to bring to bear on the problem.
再考虑一下房屋支持烹饪功能的功能需求。此类需求在需求规范中很常见,许多开发人员只需将其映射到Cooking其架构中的组件即可。然而,烹饪并不是一项要求(即使它包含在要求规范中)。烹饪是满足家中人员吃饭需求的一种可能解决方案。您可以通过订购披萨或带家人出去吃饭来满足吃饭需求。
Consider again the functional requirement for the house to support the cooking feature. Such requirements are quite common in requirements specs, and many developers will simply map that to a Cooking component in their architecture. Cooking, however, is not a requirement (even though it was in the requirement spec). Cooking is a possible solution for the requirement of feeding the people in the house. You can satisfy the feeding requirement by ordering pizza or taking the family out for dinner.
客户提供伪装成需求的解决方案的情况极为常见。使用功能分解后,一旦您部署的系统只有Cooking,客户就会要求提供披萨选项,从而导致系统中出现另一个组件或另一个组件膨胀。“外出就餐”的需求很快就会出现,导致功能围绕实际需求不断循环。使用基于波动性的分解,在需求分析期间,您应该确定为乘客提供餐食的波动性并为其提供餐食。餐食的波动性封装在Feeding组件中,并且随着餐食选项的变化,您的设计不会发生变化。
It is exceedingly common for customers to provide solutions masquerading as requirements. With functional decomposition, once you deploy the system with only Cooking, the customer will ask for the pizza option, resulting in either another component in your system or bloating of another component. The “going out to dinner” requirement will soon follow, leading to a never-ending cycle of features going around and around the real requirement. With volatility-based decomposition, during requirements analysis, you should identify the volatility in feeding the occupants and provide for it. The volatility of feeding is encapsulated within the Feeding, component and as the feeding options change, your design does not.
然而,虽然喂食是比烹饪更好的要求,但它仍然是伪装成要求的解决方案。如果为了节食,家里的人今晚应该饿着肚子睡觉怎么办?喂食要求和节食要求可能互相排斥。你可以做其中之一,但不能同时做两者。互相排斥的要求也很常见。
However, while feeding is a better requirement than cooking, it is still a solution masquerading as a requirement. What if in the interest of diet, the people in the house should go to bed hungry tonight? A feeding requirement and diet requirement might be mutually exclusive. You can do either one, but not both. Mutually exclusive requirements are also quite common.
任何房屋的真正要求是照顾居住者的幸福感,而不仅仅是他们的卡路里摄入量。房子不应该太冷或太热,太潮湿或太干燥。虽然顾客可能只讨论烹饪,从不讨论温度控制,但你应该认识到真正的波动性和幸福感,并将其融入到Wellbeing你的建筑组件中。
The real requirement for any house is to take care of the well-being of the occupants, not just their caloric intake. The house should not be too cold or too warm or too humid or too dry. While the customers may only discuss cooking and never discuss temperature control, you should recognize the real volatility, well-being, and encapsulate that in the Wellbeing component of your architecture.
由于大多数需求规范都充斥着伪装成需求的解决方案,因此功能分解绝对会最大化您的痛苦。您将永远追逐不断发展的解决方案,而永远不会认识到真正的潜在需求。
Since most requirements specifications are chock-full of solutions masquerading as requirements, functional decomposition absolutely maximizes your pain. You will forever be chasing the ever-evolving solutions, never recognizing the true underlying requirements.
需求规范中所有那些伪装成需求的解决方案其实都是因祸得福,因为您可以将家庭烹饪示例推广为一种真正的分析技术,用于发现波动性领域。首先指出伪装成需求的解决方案,然后询问是否还有其他可能的解决方案?如果有,那么真正的需求和潜在的波动性是什么?一旦确定了波动性,您必须确定解决该波动性的需求是真正的需求还是仍然是伪装成需求的解决方案。一旦你完成了所有解决方案的清理,剩下的很可能是基于波动性的分解的极佳候选者。
The fact that requirements specifications have all those solutions masquerading as requirements is actually a blessing in disguise because you can generalize the example of cooking in the house into a bona fide analysis technique for discovering areas of volatility. Start by pointing out the solutions masquerading as requirements, and ask if there are other possible solutions? If so, then what were the real requirements and the underlying volatility? Once you identify the volatility, you must determine if the need to address that volatility is a true requirement or is still a solution masquerading as a requirement. Once you have finished scrubbing away all the solutions, what you are left with are likely great candidates for volatility-based decomposition.
在分解系统和创建架构之前,您应该简单地编制一份候选波动区域列表,作为需求收集和分析的自然组成部分。您应该以开放的心态对待这份列表。问问自己,沿着波动轴,什么可能会发生变化。识别伪装成需求的解决方案,并应用本章后面描述的其他技术。这份列表是跟踪您的观察结果和组织您的想法的强大工具。不要着手实际设计。您所做的只是维护一份列表。请注意,虽然系统设计不应超过几天,但识别正确的波动区域可能需要更长的时间。
Prior to decomposing a system and creating an architecture, you should simply compile a list of the candidate areas of volatility as a natural part of requirements gathering and analysis. You should approach the list with an open mind. Ask what could change along the axes of volatility. Identify solutions masquerading as requirements, and apply the additional techniques described later in this chapter. The list is a powerful instrument for keeping track of your observations and organizing your thoughts. Do not commit yet to the actual design. All you are doing is maintaining a list. Note that while the design of the system should not take more than a few days, identifying the correct areas of volatility may take considerably longer.
使用先前对股票交易系统的要求,您应该首先准备一份可能出现波动的领域的清单,并了解每个波动背后的原因:
Using the previous requirements for the stock trading system, you should start by preparing a list of possible areas of volatility, capturing also the rationale behind each:
用户波动。交易员为最终客户提供服务,而他们负责管理这些客户的投资组合。最终客户也可能对他们的资金现状感兴趣。虽然他们可以给交易员写信或打电话,但更合适的方式是让最终客户登录系统查看当前余额和正在进行的交易。即使要求从未提及任何有关最终客户访问权限的内容(要求针对专业交易员),您也应该考虑这种访问权限。虽然最终客户可能无法进行交易,但他们应该能够查看其账户的状态。也可能有系统管理员。用户类型存在波动。
User volatility. The traders serve end customers on whose portfolios they operate. The end customers are also likely interested in the current state of their funds. While they could write the trader a letter or call, a more appropriate means would be for the end customers to log into the system to see the current balance and the ongoing trades. Even though the requirements never stated anything about end customer access (the requirements were for professional traders), you should contemplate such access. While the end customers may not be able to trade, they should be able to see the status of their accounts. There could also be system administrators. There is volatility in the type of user.
客户端应用程序波动。用户的波动通常表现为客户端应用程序和技术类型的波动。一个简单的网页可能足以让外部终端客户查询他们的余额。然而,专业交易员更喜欢多显示器、丰富的桌面应用程序,其中包含市场趋势、账户详细信息、市场行情、新闻提要、电子表格预测和专有数据。其他用户可能希望在各种类型的移动设备上查看交易。
Client application volatility. Volatility in users often manifests in volatility in the type of client application and technology. A simple web page may suffice for external end customers looking up their balance. However, professional traders will prefer a multi-monitor, rich desktop application with market trends, account details, market tickers, newsfeed, spreadsheet projection, and proprietary data. Other users may want to review the trades on mobile devices of various types.
安全性波动。用户的波动意味着用户在系统中验证身份的方式的波动。内部交易员的数量可能很少,从几十人到几百人不等。然而,该系统可能数百万终端客户。内部交易员可以依靠域帐户进行身份验证,但对于通过互联网访问信息的数百万客户来说,这是一个糟糕的选择。对于互联网用户来说,也许一个简单的用户名和密码就可以了,或者可能需要一些复杂的联合安全单点登录选项。授权选项也存在类似的波动性。安全性是不稳定的。
Security volatility. Volatility in users implies volatility in how the users authenticate themselves against the system. The number of in-house traders could be small, from a few dozens to a few hundred. The system, however, could have millions of end customers. The in-house traders could rely on domain accounts for authentication, but this is a poor choice for the millions of customers accessing information through the Internet. For Internet users, perhaps a simple user name and password will do, or maybe some sophisticated federated security single sign-on option is needed. Similar volatility exists with authorization options. Security is volatile.
通知易变性。需求指定系统在每次请求后发送电子邮件。但是,如果电子邮件被退回怎么办?系统是否应该恢复使用纸质信件?用短信或传真代替电子邮件怎么样?发送电子邮件的需求是伪装成需求的解决方案。真正的需求是通知用户,但通知传输易变。接收通知的人也存在易变性:单个用户或向接收相同通知的多个用户广播并通过任何传输方式接收。也许最终客户更喜欢电子邮件,而最终客户的税务律师更喜欢书面纸质声明。首先发布通知的人也存在易变性。
Notification volatility. The requirements specify that the system is to send an email after every request. However, what if the email bounces? Should the system fall back to a paper letter? How about a text message or a fax instead of an email? The requirement to send an email is a solution masquerading as a requirement. The real requirement is to notify the users, but the notification transport is volatile. There is also volatility in who receives the notification: a single user or a broadcast to several users receiving the same notification and over whichever transport. Perhaps the end customer prefers an email while the end customer’s tax lawyer prefers a documented paper statement. There is also volatility in who publishes the notification in the first place.
存储易变性。要求指定使用本地数据库。但是,随着时间的推移,越来越多的系统迁移到云中。股票交易本身并没有什么可以阻止从云的成本和规模经济中获益。使用本地数据库的要求实际上是另一种伪装成要求的解决方案。更好的要求是数据持久性,它可以适应持久性选项中的易变性。但是,大多数用户都是最终客户,这些用户实际上执行只读请求。这意味着系统将从使用内存缓存中受益匪浅。此外,一些云产品使用分布式内存哈希表,提供与传统基于文件的持久存储相同的弹性。要求数据持久性将排除这最后两个选项,因为数据持久性仍然是一种伪装成要求的解决方案。真正的要求只是系统不能丢失数据,或者系统需要存储数据。如何实现这一点是一个实施细节,具有很大的波动性,从本地数据库到云端的远程内存缓存。
Storage volatility. The requirements specify the use of a local database. However, over time, more and more systems migrate to the cloud. There is nothing inherent in stock trading that precludes benefiting from the cost and economy of scale of the cloud. The requirement to use a local database is actually another solution masquerading as a requirement. A better requirement is data persistence, which accommodates the volatility in the persistence options. However, the majority of users are end customers, and those users actually perform read-only requests. This implies the system will benefit greatly from the use of an in-memory cache. Furthermore, some cloud offerings utilize a distributed in-memory hash table that offers the same resiliency as traditional file-based durable storage. Requiring data persistence would exclude these last two options because data persistence is still a solution masquerading as a requirement. The real requirement is simply that the system must not lose the data, or that the system is required to store the data. How that is accomplished is an implementation detail, with a great deal of volatility, from a local database to a remote in-memory cache in the cloud.
连接和同步不稳定。当前要求以连接、同步、锁步的方式填写 Web 表单并按顺序提交。这意味着交易员一次只能执行一个请求。但是,交易员执行的交易越多,他们赚的钱就越多。如果请求是独立的,为什么不异步发出它们?如果请求在时间上被推迟(未来的交易),为什么不将对系统的调用排队以减少负载?执行异步调用(包括排队调用)时,请求可能会无序执行。连接和同步性不稳定。
Connection and synchronization volatility. The current requirements call for a connected, synchronous, lock-step manner of completing a web form and submitting it in-order. This implies that the traders can do only one request at a time. However, the more trades the traders execute, the more money they make. If the requests are independent, why not issue them asynchronously? If the requests are deferred in time (trades in the future), why not queue up the calls to the system to reduce the load? When performing asynchronous calls (including queued calls), the requests can execute out of order. Connectivity and synchronicity are volatile.
持续时间和设备波动。有些用户会在一次短暂的会话中完成交易。但是,交易者在进行分散和对冲风险的复杂交易时,可以赚取利润并实现收入最大化,这些交易涉及多个股票和行业、国内或国外市场等。构建这样的交易可能非常耗时,持续数小时到数天不等。这种长时间的交互可能会跨越多个系统会话,甚至可能是多个物理设备。交互持续时间存在波动,这反过来会触发所涉及的设备和连接的波动。
Duration and device volatility. Some users will complete a trade in one short session. However, traders earn their keep and maximize their income when they perform complicated trades that distribute and hedge risk, involving multiple stocks and sectors, domestic or foreign markets, and so on. Constructing such a trade can be time-consuming, lasting anywhere from several hours to several days. Such a long-running interaction will likely span multiple system sessions and possibly multiple physical devices. There is volatility in the duration of the interaction, which in turn triggers volatility in the devices and connections involved.
贸易项目波动性。如前所述,随着时间的推移,最终客户可能不仅想交易股票,还想交易商品、债券、货币,甚至期货合约。贸易项目本身具有波动性。
Trade item volatility. As discussed previously, over time, the end customers may want to trade not just stocks but also commodities, bonds, currencies, and maybe even future contracts. The trade item itself is volatile.
工作流程不稳定。如果交易项目不稳定,则交易所涉及的步骤的处理也会不稳定。买卖股票、安排订单等与出售商品、债券或货币有很大不同。因此,交易的工作流程不稳定。同样,交易分析的工作流程也是不稳定的。
Workflow volatility. If the trade item is volatile, processing of the steps involved in the trade will be volatile too. Buying and selling stocks, scheduling their orders, and so on are very different from selling commodities, bonds, or currencies. The workflow of the trade is therefore volatile. Similarly, the workflow of trade analysis is volatile.
地区和法规的波动。随着时间的推移,系统可能会部署到不同的地区。地区的波动对交易规则、用户界面本地化、交易项目列表、税收和法规合规性有重大影响。地区和适用的法规都是不稳定的。
Locale and regulations volatility. Over time, the system may be deployed into different locales. Volatility in the locale has drastic implications on the trading rules, UI localization, the listing of trade items, taxation, and regulatory compliance. The locale and the regulations that apply therein are volatile.
市场信息波动。市场数据的来源可能随时间而变化。各种信息有不同的格式、成本、更新率、通信协议等。不同的信息可能在同一时间点显示同一股票的略微不同的价值。信息可以是外部的(例如彭博或路透社)或内部的(例如用于测试、诊断或交易算法研究的模拟市场数据)。市场信息具有波动性。
Market feed volatility. The source of market data could change over time. Various feeds have a different format, cost, update rate, communication protocols, and so on. Different feeds may show slightly different value for the same stock at the same point in time. The feeds can be external (e.g., Bloomberg or Reuters) or internal (e.g., simulated market data for testing, diagnostics, or trading algorithms research). The market feed is volatile.
上述列表绝不是股票交易系统中可能发生变化的所有事物的详尽列表。其目的是指出可能发生变化的内容以及在寻找波动性时需要采用的思维方式。一些波动区域可能超出项目范围。它们可能会被领域专家排除在外,因为它们不太可能发生,或者可能与业务性质过于相关(例如从股票扩展到货币或国外市场)。然而,我的经验是,尽早指出波动区域并将其映射到分解中至关重要。在架构中指定组件几乎不需要花费任何成本。稍后,您必须决定是否分配精力来设计和构建它。但是,至少现在您知道如何处理这种可能性。
The preceding list is by no means an exhaustive list of all the things that could change in a stock trading system. Its objective is to point out what could change and the mindset you need to adopt when searching for volatility. Some of the volatile areas may be out of scope for the project. They may be ruled out by domain experts as improbable or may relate too much to the nature of the business (such as branching out of stocks into currencies or foreign markets). My experience, however, is that it is vital to call out the areas of volatility and map them in your decomposition as early as possible. Designating a component in the architecture costs you next to nothing. Later, you must decide whether or not to allocate the effort to designing and constructing it. However, at least now you are aware how to handle that eventuality.
一旦确定了波动性区域,就需要将它们封装到架构的组件中。图 2-13描述了一种可能的分解。
Once you have settled on the areas of volatility, you need to encapsulate them in components of the architecture. One such possible decomposition is depicted in Figure 2-13.
图 2-13基于波动率的交易系统分解
Figure 2-13 Volatility-based decomposition of a trading system
从易变区域列表到架构组件的转换几乎从来都不是一对一的。有时,单个组件可以封装多个易变区域。某些易变区域可能不会直接映射到组件,而是映射到操作概念,例如排队或发布事件。在其他时候,区域的易变性可能封装在第三方服务中。
The transition from the list of volatile areas to components of the architecture is hardly ever one to one. Sometimes a single component can encapsulate more than one area of volatility. Some areas of volatility may not be mapped directly to a component but rather to an operational concept such as queuing or publishing an event. At other times, the volatility of an area may be encapsulated in a third-party service.
在设计时,始终从简单易行的决策开始。这些决策会约束系统,使后续决策更容易。在此示例中,某些映射很容易进行。数据存储中的易失性封装在数据访问组件后面,这些组件不会泄露存储的位置以及使用何种技术来访问它。请注意图 2-13中将存储称为Storage和不称为 的关键抽象Database。虽然实现(根据要求)是一个本地数据库,但架构中没有任何内容排除其他选项,例如原始文件系统、缓存或云。如果对存储进行了更改,则更改将封装在相应的访问组件(例如Trades Access)中,并且不会影响其他组件,包括任何其他访问组件。这使您能够以最小的后果更改存储。
With design, always start with the simple and easy decisions. Those decisions constrain the system, making subsequent decisions easier. In this example, some mapping is easy to do. The volatility in the data storage is encapsulated behind data access components, which do not betray where the storage is and what technology is used to access it. Note in Figure 2-13 the key abstraction of referring to the storage as Storage and not as Database. While the implementation (according to the requirements) is a local database, there is nothing in the architecture that precludes other options, such as the raw file system, a cache, or the cloud. If a change to the storage takes place, it is encapsulated in the respective access component (such as the Trades Access) and does not affect the other components, including any other access component. This enables you to change the storage with minimal consequences.
通知客户端的易变性封装在Notification组件中。该组件知道如何通知每个客户端以及哪些客户端订阅哪个事件。对于简单的场景,您可以使用通用事件发布和订阅服务 ( Pub/Sub ) 而不是自定义Notification组件进行充分管理。但是,在这种情况下,传输类型和广播性质可能存在一些业务规则。Notification组件可能仍会在其下方使用一些 Pub/Sub 服务,但这是一个内部实现细节,其易变性也封装在Notification组件中。
The volatility in notifying the clients is encapsulated in the Notification component. This component knows how to notify each client and which clients subscribe to which event. For simple scenarios, you can manage sufficiently with a general-purpose events publishing and subscription service (Pub/Sub) instead of a custom Notification component. However, in this case, there are likely some business rules on the type of transport and nature of the broadcast. The Notification component may still use some Pub/Sub service underneath it, but that is an internal implementation detail whose volatility is also encapsulated in the Notification component.
交易工作流中的波动性被封装在Trade Workflow组件中。该组件封装了交易商品(股票或货币)的波动性、买卖交易商品的具体步骤、当地市场所需的定制、所需报告的详细信息等等。请注意,即使交易商品是固定的(不易变),交易股票的工作流也可能发生变化,这证明了使用Trade Workflow封装波动性的合理性。该设计还依赖于存储工作流的操作概念(这应该使用某些第三方工作流工具来实现)。Trade Workflow检索每个会话的适当工作流实例,对其进行操作,然后将其存储回存储中Workflow。这个概念有助于封装多种波动性。首先,不同的交易商品现在可以拥有不同的交易工作流。其次,不同的语言环境可以拥有不同的工作流。第三,这可以支持跨多个设备和会话的长期运行工作流。系统不关心两个呼叫是相隔几秒还是几天。在每种情况下,系统都会加载工作流实例以处理下一步。该设计将连接的单会话交易与长期运行的分布式交易完全相同。对称性和一致性是系统架构的良好品质。还请注意,工作流存储访问的封装方式与交易存储访问的封装方式相同。
The volatility in the trading workflow is encapsulated in the Trade Workflow component. That component encapsulates the volatility of what is being traded (stocks or currencies), the specific steps involved in buying or selling a trade item, the required customization for local markets, the details for the required reports, and so on. Note that even if the trade items are fixed (not volatile), the workflow of trading stocks can change, justifying the use of Trade Workflow to encapsulate the volatility. The design also relies on the operational concept of storing the workflows (this should be implemented using some third-party workflow tool). Trade Workflow retrieves the appropriate workflow instance for each session, operates on it, and stores it back in the Workflow Storage. This concept helps encapsulate several volatilities. First, different trade items can now have distinct trading workflows. Second, different locales can have different workflows. Third, this enables supporting long-running workflows spanning multiple devices and sessions. The system does not care if two calls are seconds apart or days apart. In each case, the system loads the workflow instance to process the next step. The design treats connected, single-session trades exactly the same as a long-running distributed trade. Symmetry and consistency are good qualities in system architecture. Note also that the workflow storage access is encapsulated in the same fashion as the trades storage access.
您可以对股票交易工作流和分析工作流使用相同的模式。专用Analysis Workflow组件封装了分析工作流中的波动性,并且可以使用相同的Workflow Storage。
You can use the same pattern for the stock trading workflow and the analysis workflows. The dedicated Analysis Workflow component encapsulated the volatility in the analysis workflows, and it can use the same Workflow Storage.
访问市场 feed 的易变性封装在 中Feed Access。此组件封装了如何访问 feed 以及 feed 本身是内部的还是外部的。来自不同 feed 的各种市场数据的格式甚至值的易变性封装在 中Feed Transformation。这两个组件都通过提供统一的接口和格式(无论数据来源如何)将其他组件与 feed 分离。
The volatility of accessing the market feed is encapsulated in the Feed Access. This component encapsulates how to access the feed and whether the feed itself is internal or external. The volatility in the format or even value of the various market data coming from the different feeds is encapsulated in the Feed Transformation component. Both of these components decouple the other components from the feeds by providing a uniform interface and format regardless of the origin of the data.
该Security组件封装了对用户进行身份验证和授权的可能方式的易变性。在内部,它可能从本地存储中查找凭据或与某些分布式提供商进行交互。
The Security component encapsulates the volatility of the possible ways of authenticating and authorizing the users. Internally, it may look up credentials from a local storage or interact with some distributed provider.
系统的客户端可以是交易应用程序(Trader App A)或移动应用程序(Trader App B)。最终客户可以使用自己的网站(Customer Portal)。每个客户端应用程序还封装了详细信息以及在目标设备上呈现信息的最佳方式。
The clients of the system can be the trading application (Trader App A) or a mobile app (Trader App B). The end customers can use their own website (Customer Portal). Each client application also encapsulates the details and the best way of rendering the information on the target device.
请注意,图 2-13中没有专用的报告组件。出于演示目的,报告未列为易变区域(从业务角度来看)。因此,没有什么可以用组件封装的。添加这样的组件体现了功能分解。但是,如果您只做过功能分解,您可能会听到一个不可抗拒的诱惑,召唤您添加报告块。仅仅因为您一直有一个报告块,甚至因为您有一个现有的报告块,并不意味着您需要一个报告块。
Note in Figure 2-13 the absence of a dedicated reporting component. For demonstration purposes, reporting was not listed as a volatile area (from the business perspective). Therefore, there is nothing to encapsulate with a component. Adding such a component manifests functional decomposition. However, if functional decomposition is all you have ever done, you will likely hear an irresistible siren song calling you to add a reporting block. Just because you always have had a reporting block, or even because you have an existing reporting block, does not mean you need a reporting block.
在荷马的《奥德赛》中,这个故事已有 2500 多年历史,奥德修斯经由塞壬海峡航行回家。塞壬是长着美丽翅膀的精灵般的生物,拥有天使般的声音。她们唱着一首无人能抗拒的歌声。水手们扑向她们的手臂,塞壬将人们淹死在水底并吃掉了他们。在感受到塞壬歌声的致命诱惑之前,奥德修斯(您,建筑师)被建议用蜂蜡堵住水手们(普通软件开发人员)的耳朵,并将他们绑在桨上。水手们的工作是划船(编写代码),他们甚至没有自由听塞壬的声音。另一方面,作为领导者的奥德修斯自己却没有堵住耳朵的奢侈(例如,也许您确实需要那个报告块)。奥德修斯将自己绑在船桅杆上,这样即使他想屈服于塞壬也无法做到(见图2-14,描绘了时期花瓶上的场景)。你是奥德修斯,基于挥发性的分解是你的桅杆。抵制你以前坏习惯的诱惑。
In Homer’s Odyssey, a story that is more than 2500 years old, Odysseus sails home via the Straights of the Sirens. The Sirens are beautiful winged fairy-like creatures who have the voices of angels. They sing a song that no man can resist. The sailors jump to their arms, and the Sirens drown the men under the waves and eat them. Before encountering the deadly allure of the Sirens’ songs, Odysseus (you, the architect) is advised to plug with beeswax the ears of his sailors (the rank and file software developers) and tie them to the oars. The sailors’ job is to row (write code), and they are not even at liberty to listen to the Sirens. Odysseus himself, on the other hand, as the leader, does not have the luxury of plugging his ears (e.g., maybe you do need that reporting block). Odysseus ties himself to the mast of the ship so that he cannot succumb to the Sirens even if he wanted to do so (see Figure 2-14, depicting the scene on a period vase). You are Odysseus, and volatility-based decomposition is your mast. Resist the siren song of your previous bad habits.
图 2-14绑在桅杆上(图片来源:Werner Forman Archive/Shutterstock)
Figure 2-14 Tied to the mast (Image: Werner Forman Archive/Shutterstock)
虽然必须封装易变区域,但并非所有可能发生变化的区域都应封装。换句话说,可能发生变化的事物不一定易变。一个典型的例子是业务性质,您不应试图封装业务性质。几乎所有业务应用程序都是为了满足业务或其客户的某些需求而存在的。但是,业务性质(以及每个应用程序的性质)往往相当稳定。一家在某个行业经营了很长时间的公司很可能会一直从事该行业。例如,联邦快递过去、现在和将来都从事运输和递送业务。虽然理论上联邦快递有可能进军医疗保健领域,但这种潜在的变化并不是您应该封装的内容。
While you must encapsulate the volatile areas, not everything that could change should be encapsulated. Put differently, things that could change are not necessarily volatile. A classic example is the nature of the business, and you should not attempt to encapsulate the nature of the business. With almost all business applications, the applications exist to serve some need of the business or its customers. However, the nature of the business, and by extension, each application, tends to be fairly constant. A company that has been in a business for a long time will likely stay in that business. For example, Federal Express has been, is, and will be in the shipment and delivery business. While in theory it is possible for Federal Express to branch into healthcare, such a potential change is not something you should encapsulate.
在系统分解过程中,您必须确定要封装的波动区域和不需要封装的区域(例如,业务性质)。有时,您最初很难区分它们。如果某些可能发生变化的东西确实是业务性质的一部分,则有两个简单的指标。第一个指标是可能的变化很少见。是的,它可能会发生,但发生的可能性非常低。第二个指标是,任何试图封装变更的尝试都只能做得很差。无论投入多少时间或精力,都无法以一种让你感到自豪的方式正确地封装该方面。
During system decomposition, you must identify both the areas of volatility to encapsulate and those not to encapsulate (e.g., the nature of the business). Sometimes, you will have initial difficulty in telling these apart. There are two simple indicators if something that could change is indeed part of the nature of the business. The first indicator is that the possible change is rare. Yes, it could happen, but the likelihood of it happening is very low. The second indicator is that any attempt to encapsulate the change can only be done poorly. No practical amount of investment in time or effort will properly encapsulate the aspect in a way of which you can be proud.
例如,考虑在一块土地上设计一栋简单的住宅。在未来的某个时候,房主可能会决定将房屋扩建为 50 层的摩天大楼。将这种可能的变化融入您的房屋设计中,会产生与典型住宅设计截然不同的设计。房屋地基不是浅模浇筑的地基,而是必须包括数十个摩擦塔,这些塔可能向下打入数百英尺以支撑建筑物的重量。这将使地基能够同时支撑单户住宅和摩天大楼。接下来,配电板必须能够分配数千安培的电流,并且可能需要房屋有自己的变压器。虽然自来水公司可以将水送到房子里,但您必须留出一个房间来放置大型水泵,以便将水推到 50 层楼。下水道管道必须能够处理 50 层楼的居民。您必须为单户住宅进行所有这些巨大的投资。
For example, consider designing a simple residential house on a plot of land. At some point in the future, the homeowner may decide to extend the home into a 50-story skyscraper. Encapsulating that possible change in your house design produces a very different design than that of your typical residential house design. Instead of a shallow form-poured foundation, the house foundation must include dozens of friction pylons, driven down to maybe hundreds of feet to support the weight of the building. This will allow the foundation to support both a single family residential and a skyscraper. Next, the power panel must be able to distribute thousands of amps and likely requires the house to have its own transformer. While the water company can bring water to the house, you must devote a room for a large water pump that can push the water up 50 floors. The sewer line must be able to handle 50 floors of inhabitants. You will have to do all that tremendous investment for a single-family home.
完工后,地基将囊括建筑物重量的变化,配电盘将囊括单户住宅和 50 层楼的需求,等等。然而,这两个指标现在都被违反了。首先,你所在城市每年有多少房主将他们的房子改建成摩天大楼?这种情况有多普遍?在一个拥有一百万户家庭的大都市区,这种情况可能每隔几年就会发生一次,因此这种变化非常罕见,甚至百万分之一的概率也只有一次。其次,你是否真的有足够的资金(最初分配给单户住宅的资金)来妥善执行所有这些囊括?一座塔架的成本可能比单户住宅的建筑成本还高。任何试图囊括未来向摩天大楼过渡的尝试都将是失败的,既无用也不划算。
When you are finished, the foundation will encapsulate the change to the weight of the building, the power panel will encapsulate the demands of both a single home and 50 stories, and so on. However, the two indicators are now violated. First, how many homeowners in your city annually do convert their home to a skyscraper? How common is that? In a large metropolitan area with a million homes, it may happen once every few years, making the change very rare, once in a million if that. Second, do you really have the funds (allocated initially for a single home) to properly execute all these encapsulations? A single pylon may cost more than the single-family building. Any attempt to encapsulate the future transition to a skyscraper will be done poorly and will be neither useful nor cost-effective.
将单户住宅改建为 50 层高的建筑是业务性质的改变。该建筑不再是用于家庭住房的业务。现在,它变成了酒店或办公楼。当土地开发商购买该地块用于此类改建时,开发商通常会选择拆除建筑,挖出旧地基,然后重新开始。业务性质的改变使您可以放弃旧系统并从头开始。需要注意的是,业务性质的背景在某种程度上是分形的。背景可以是公司的业务、公司部门或分部的业务,甚至可以是特定应用程序的业务附加值。所有这些都代表了您不应封装的内容。
Converting the single-family home to a 50-story building is a change to the nature of the business. No longer is the building in the business of housing a family. Now it is in the business of being a hotel or an office building. When a land developer purchases that plot of land for the purpose of such conversion, the developer usually chooses to raze the building, dig out the old foundation, and start afresh. A change to the nature of the business permits you to kill the old system and start from scratch. It is important to note that the context of the nature of the business is somewhat fractal. The context can be the business of the company, the business of a department or a division in a company, or even the business added value of a specific application. All these represent things that you should not encapsulate.
推测性设计是试图封装业务性质的一种变体。一旦你认同基于波动性的分解原则,你就会开始看到到处都存在可能的波动,而且很容易做得过头。如果走极端,你就有可能试图封装任何东西和所有地方。你的设计将有许多构建块,这是糟糕设计的明显标志。
Speculative design is a variation on trying to encapsulate the nature of the business. Once you subscribe to the principle of volatility-based decomposition, you will start seeing possible volatilities everywhere and can easily overdo it. When taken to the extreme, you run the risk of trying to encapsulate anything and everywhere. Your design will have numerous building blocks, a clear sign of a bad design.
例如,考虑图 2-15中的项目。
Consider for example the item in Figure 2-15.
图 2-15推测设计(图片来源:Gercen/Shutterstock)
Figure 2-15 Speculative design (Image: Gercen/Shutterstock)
该物品是一双适合潜水的女士高跟鞋。虽然身着精致晚礼服的女士可以穿着这双鞋在派对上招待客人,但她会不会马上离开,直接走到门廊,穿上潜水装备,然后潜入珊瑚礁呢?这双鞋和传统的高跟鞋一样优雅吗?在游泳或踩到尖锐的珊瑚时,它们和普通的脚蹼一样有效吗?虽然图 2-15中的物品是可以使用的,但可能性极小。此外,他们试图提供的一切都做得很差,因为他们试图将鞋子的性质从时尚配饰转变为潜水配饰,这是你永远不应该尝试的。如果你尝试这样做,你就掉进了投机性设计陷阱。大多数这样的设计只是对你的系统未来变化(即业务性质的变化)的轻率猜测。
The item is a pair of SCUBA-ready lady’s high heels. While a lady adorned in a fine evening gown could entertain her guests at the party wearing these, how likely is it that she will excuse herself, proceed immediately to the porch, draw on SCUBA gear, and dive into the reef? Are these shoes as elegant as conventional high heels? Are these as effective as regular flippers when it comes to swimming or stepping on sharp coral? While the use of the items in Figure 2-15 is possible, it is extremely unlikely. In addition, everything they try to provide is done very poorly because of the attempt to encapsulate a change to the nature of the shoe, from a fashion accessory to a diving accessory, something you should never attempt. If you try this, you have fallen into the speculative design trap. Most such designs are simply frivolous speculation on a future change to your system (i.e., a change to the nature of the business).
识别波动性的另一种有用方法是尝试为你的竞争对手(或你公司的另一个部门)设计一个系统。例如,假设你是联邦快递下一代系统的首席架构师。您的主要竞争对手是 UPS。联邦快递和 UPS 都运送包裹。两者都收取资金、安排提货和送货、跟踪包裹、为包裹投保并管理卡车和飞机车队。问自己以下问题:联邦快递可以使用 UPS 正在使用的软件系统吗?UPS 可以使用联邦快递想要构建的系统吗?如果答案是否定的,请开始列出这种重用或可扩展性的所有障碍。虽然两家公司在抽象上提供相同的服务,但他们开展业务的方式却不同。例如,联邦快递可能以一种方式规划运输路线,而 UPS 可能以另一种方式规划。在这种情况下,运输规划可能不稳定,因为如果有两种方式做某事,那么可能会有更多方式。您必须封装运输规划并在您的架构中为此目的指定一个组件。如果联邦快递在未来某个时间开始以与 UPS 相同的方式规划运输,则更改现在包含在单个组件中,这使得更改变得容易,并且只影响该组件的实现,而不会影响分解。您已为您的系统做好了面向未来的准备。
Another useful technique for identifying volatilities is to try to design a system for your competitor (or another division in your company). For example, suppose you are the lead architect for Federal Express’s next-generation system. Your main competitor is UPS. Both Federal Express and UPS ship packages. Both collect funds, schedule pickup and delivery, track packages, insure content, and manage trucks and airplane fleets. Ask yourself the following question: Can Federal Express use the software system UPS is using? Can UPS use the system Federal Express wants to build? If the likely answer is no, start listing all the barriers for such a reuse or extensibility. While both companies perform in the abstract the same service, the way they conduct their business is different. For example, Federal Express may plan shipment routes one way, while UPS may plan them another. In that case, shipment planning is probably volatile because if there are two ways of doing something, there may be many more. You must encapsulate the shipment planning and designate a component in your architecture for that purpose. If Federal Express starts planning shipments the same as UPS at some future time, the change is now contained in a single component, making the change easy and affecting only the implementation of that component, not the decomposition. You have future-proofed your system.
相反的情况也是如此。如果您和您的竞争对手(甚至更好的是,所有竞争对手)以相同的方式执行某些活动或序列,并且您的系统不可能以任何其他方式执行,那么就没有必要在架构中为该活动分配组件。这样做会造成功能分解。当您遇到竞争对手做同样的事情时,很可能它代表了业务的性质,正如前面所讨论的那样,您不应该封装它。
The opposite case is also true. If you and your competitor (and even better, all competitors) do some activity or sequence the same way, and there is no chance of your system doing it any other way, then there is no need to allocate a component in the architecture for that activity. To do so would create a functional decomposition. When you encounter something your competitors do identically, more likely than not, it represents the nature of the business, and as discussed previously, you should not encapsulate it.
波动性与寿命密切相关。公司或应用程序以同样的方式做某事的时间越长,公司继续以同样的方式做事的可能性就越大。换句话说,事情越久不改变,它们改变或被取代的时间就越长。你必须提出一个能适应这种变化的设计,即使乍一看这种变化与当前的需求无关。
Volatility is intimately related to longevity. The longer the company or the application has been doing something the same way, the higher the likelihood the company will keep doing it the same way. Put differently, the longer things do not change, the longer they have until they do change or are replaced. You must put forward a design that accommodates such changes, even if at first glance such changes are independent of the current requirements.
您甚至可以使用简单的启发式方法估计这种变化可能发生的时间:组织(或客户或市场)发起或吸收变化的能力或多或少是恒定的,因为它与业务性质有关。例如,医院 IT 部门比新兴的区块链初创公司更保守,对变化的容忍度更低。因此,事物变化越频繁,未来变化的可能性就越大,但变化速度相同。例如,如果公司每 2 年更改一次工资单系统,则公司很可能会在 2025 年内更改工资单系统。未来 2 年。如果您设计的系统需要与工资系统接口,并且使用系统的时间跨度超过 2 年,那么您必须封装工资系统的波动性并计划包含预期的变化。您必须考虑工资系统变更的影响,即使从未明确要求您进行变更。您应该努力封装系统生命周期内发生的变化。如果预计的寿命为 5 到 7 年,那么一个好的起点是确定过去 7 年中应用程序领域发生的所有变化。类似的变化很可能会在类似的时间跨度内发生。
You can even guesstimate how long it will be until such a change is likely to take place using a simple heuristic: the ability of the organization (or the customer or the market) to instigate or absorb a change is more or less constant because it is tied to the nature of the business. For example, a hospital IT department is more conservative and has less tolerance for change than a nascent blockchain startup. Thus, the more frequently things change, the more likely they will change in the future, but at the same rate. For example, if every 2 years the company changes its payroll system, it is likely the company will change the payroll system within the next 2 years. If the system you design needs to interface with the payroll system and the horizon for using your system is longer than 2 years, then you must encapsulate the volatility in the payroll system and plan to contain the expected change. You must take into account the effect of a payroll system change even if the change was never given to you as an explicit requirement. You should strive to encapsulate changes that occur within the life of the system. If that projected lifespan is 5 to 7 years, a good starting point is identifying all the things that have changed in the application domain over the past 7 years. It is likely similar changes will occur within a similar timespan.
您应该以这种方式检查您的设计与之交互的所有相关系统和子系统的寿命。例如,如果企业资源规划 (ERP) 系统每 10 年更改一次,上次 ERP 更改是 8 年前,而新系统的期限为 5 年,那么 ERP 很可能会在系统的生命周期内发生变化。
You should examine this way the longevity of all involved systems and subsystems with which your design interacts. For example, if the enterprise resource planning (ERP) system changes every 10 years, the last ERP change was 8 years ago, and the horizon for your new system is 5 years, then it is a good bet the ERP will change during the life of your system.
如果你只花 2% 的时间在某件事上,那么无论你拥有怎样的智力或使用的方法,你都永远不会擅长这件事。要相信每隔几年就会有人能靠近白板,画几条线,然后确定架构,那需要多么惊人的狂妄自大。专业人士(无论是医生、飞行员、焊工还是律师)的基本期望是,他们要通过培训掌握自己的技能。你不会希望成为飞行员只有少量飞行时间的飞机上的乘客。你也不会希望成为医生的第一位病人。商业航空公司的飞行员要花费数年时间(复数)在模拟器上,并由经验丰富的飞行员通过数百次飞行进行培训。医生在接触第一位病人之前要解剖无数的尸体,即使这样,他们也受到严密的监督。
If you only spend 2% of the time on anything, you will never be any good at it, regardless of your built-in intellect or methodology used. An amazing level of hubris is required to believe that once every few years someone can approach a whiteboard, draw a few lines, and nail the architecture. The basic expectation of professionals, be they doctors, pilots, welders or lawyers, is that they master their craft by training for it. You would not wish to be the passenger aboard a plane where the pilot has only a handful of flying hours. You would not wish to be the first patient of a doctor. Commercial airline pilots spend years (plural) in simulators and are trained through hundreds of flights by veteran pilots. Doctors dissect countless cadavers before they can touch the first patient, and even then, they are closely supervised.
识别波动性领域是一项后天习得的技能。几乎没有任何软件架构师最初接受过基于波动性的分解培训,绝大多数系统和项目都使用功能分解(结果很糟糕)。掌握基于波动性的分解的最佳方法是实践。这是解决 2% 问题的唯一方法。以下是您可以开始的几种方法:
Identifying areas of volatility is an acquired skill. Hardly any software architect is initially trained in volatility-based decomposition, and the vast majority of systems and projects use functional decomposition (with abysmal results). The best way of going about mastering volatility-based decomposition is to practice. This is the only way to address the 2% problem. Here are several ways you can start:
在您熟悉的日常软件系统上练习,例如典型的保险公司、移动应用程序、银行或在线商店。
Practice on an everyday software system with which you are familiar, such as your typical insurance company, a mobile app, a bank, or an online store.
检查一下你自己过去的项目。事后看来,你已经知道痛点是什么了。过去那个项目的架构是否在功能上完成了?事情真的发生了变化吗?这些变化的连锁反应是什么?如果你能概括这种波动性,你是否能够更好地应对这种变化?
Examine your own past projects. In hindsight, you already know what the pain points were. Was that architecture of that past project done functionally? What things did change? What were the ripple effects of those changes? If you had encapsulated that volatility, would you have been able to deal with that change better?
看看你当前的项目。现在挽救它还不算太晚:它是否功能齐全?你能列出易变的领域并提出更好的架构吗?
Look at your current project. It may not be too late to save it: Is it designed functionally? Can you list the areas of volatility and propose a superior architecture?
观察非软件系统,例如自行车、笔记本电脑、房屋,并从中找出波动的领域。
Look at non-software systems such as a bicycle, a laptop, a house, and identify in those the areas of volatility.
然后重复做,再多做几次。反复练习。在分析了三到五个系统之后,你应该掌握了一般技巧。遗憾的是,学会识别波动区域并不是通过观察别人就能掌握的。你不能从书本上学会骑自行车。你必须骑上自行车(并摔倒)几次。基于波动性的分解也是如此。然而,在练习中摔倒总比在真实对象上做实验要好。
Then do it again and do it some more. Practice and practice. After you have analyzed three to five systems, you should get the general technique. Sadly, learning to identify areas of volatility is not something you get to master by watching others. You cannot learn to ride a bicycle from a book. You have to mount a bicycle (and fall) a few times. The same is true with volatility-based decomposition. It is, however, preferable to fall during practice than to experiment on live subjects.
上一章讨论了基于波动性的分解这一通用设计原则。这一原则支配着所有实际系统的设计 — — 从房屋、笔记本电脑、巨型飞机到您自己的身体。为了生存和发展,它们都封装了其组成部分的波动性。软件架构师只需设计软件系统。幸运的是,这些系统具有共同的波动性领域。多年来,我在数百个系统中发现了这些共同的波动性领域。此外,这些共同的波动性领域之间存在典型的交互、约束和运行时关系。如果您认识到这些,您就可以快速、高效、有效地生成正确的系统架构。
The previous chapter discussed the universal design principle of volatility-based decomposition. This principle governs the design of all practical systems—from houses, to laptops, to jumbo planes, to your own body. To survive and thrive, they all encapsulate the volatility of their constituent components. Software architects only have to design software systems. Fortunately, these systems share common areas of volatility. Over the years I have found these common areas of volatility within hundreds of systems. Furthermore, there are typical interactions, constraints, and run-time relationships between these common areas of volatility. If you recognize these, you can produce correct system architecture quickly, efficiently, and effectively.
鉴于这一观察,该方法为波动性领域提供了模板、交互指南并推荐了操作模式。通过这样做,该方法超越了单纯的分解。能够在大多数软件系统中提供这样的一般指导方针和结构可能听起来有些牵强。您可能想知道这些广泛的概述如何可能适用于各种软件系统。原因是好的架构允许在不同的环境中使用。例如,老鼠和大象有很大的不同,但它们使用相同的架构。然而,老鼠和大象的详细设计却大不相同。同样,该方法可以为您提供系统架构,但不能提供其详细设计。
Given this observation, The Method provides a template for the areas of volatility, guidelines for the interaction, and recommends operational patterns. By doing so, The Method goes beyond mere decomposition. Being able to furnish such general guidelines and structure across most software systems may sound far-fetched. You may wonder how these kinds of broad strokes could possibly apply across the diversity of software systems. The reason is that good architectures allow use in different contexts. For example, a mouse and an elephant are vastly different, yet they use identical architecture. The detailed designs of the mouse and the elephant, however, are very different. Similarly, The Method can provide you with the system architecture, but not its detailed design.
本章主要介绍方法构建系统的方式、这种方式带来的优势以及对架构的影响。您将看到基于语义的服务分类和相关指南,以及如何分层设计。此外,对架构中的组件及其关系使用清晰、一致的命名法还有另外两个好处。首先,它提供了一个良好的起点。您仍然需要为此付出努力,但至少您从一个合理的起点开始。其次,它改善了沟通,因为您现在可以将您的设计意图传达给其他架构师或开发人员。即使以这种方式与自己交流也很有价值,因为它有助于理清自己的想法。
This chapter is all about The Method’s way of structuring a system, the advantages this brings, and its implications on the architecture. You will see classification of services based on their semantics and the associated guidelines, as well as how to layer your design. In addition, having clear, consistent nomenclature for components in your architecture and their relationship brings two other advantages. First, it provides a good starting point. You will still have to sweat over it, but at least you start at a reasonable point. Second, it improves communication because you can now convey your design intent to other architects or developers. Even communicating with yourself in this way is very valuable, as it helps to clarify your own thoughts.
在深入研究架构之前,请先考虑需求。大多数项目(如果它们愿意捕捉需求的话)都使用功能需求。功能需求只是陈述所需的功能,例如“系统应该做A”。这实际上是一种糟糕的需求指定方式,因为它使系统A功能的实现留有余地。事实上,功能需求为客户和营销部门之间、营销部门和工程部门之间,甚至开发人员之间产生误解提供了多种机会。这种模糊性往往会持续存在,直到您已经花费大量精力开发和部署系统,此时纠正它是最昂贵的。
Before diving into architecture, consider requirements. Most projects, if they even bother to capture the requirements, use functional requirements. Functional requirements simply state the required functionality, such as “The system should do A.” This is actually a poor way of specifying requirements, because it leaves the system’s implementation of the A functionality open for interpretation. In fact, functional requirements allow for multiple opportunities for misinterpretations to arise between the customers and marketing, between marketing and engineering, and even between developers. This kind of ambiguity tends to persist until you have already spent considerable effort on developing and deploying the system, at which point rectifying it is the most expensive.
需求应该捕捉所需的行为,而不是所需的功能。您应该指定系统需要如何运行,而不是它应该做什么,这可以说是需求收集的本质。与大多数其他事情一样,这确实需要额外的工作和努力(人们一般会尽量避免),因此将需求转化为这种形式将是一项艰巨的任务。
Requirements should capture the required behavior rather than the required functionality. You should specify how the system is required to operate as opposed to what it should do, which is arguably the essence of requirements gathering. As with most other things, this does take additional work and effort (something that people in general try to avoid), so getting requirements into this form will be an uphill struggle.
用例是所需行为的表达,即系统需要如何完成某些工作并为业务增加价值。因此,用例是系统中活动的特定序列。用例往往冗长且具有描述性。它们可以描述最终用户与系统的交互,或系统与其他系统的交互,或后端处理。这种能力很重要,因为在任何设计良好的系统中,即使是中等规模和复杂度的系统,用户也只会与系统的一小部分进行交互或观察,这代表了冰山一角。系统的大部分仍处于水线以下,您也应该为其生成用例。
A use case is an expression of required behavior—that is, how the system is required to go about accomplishing some work and adding value to the business. As such, a use case is a particular sequence of activities in the system. Use cases tend to be verbose and descriptive. They can describe end-user interactions with the system, or the system’s interactions with other systems, or back-end processing. This ability is important because in any well-designed system, even one of modest size and complexity, the users interact with or observe just a small part of the system, which represents the tip of the iceberg. The bulk of the system remains below the waterline, and you should produce use cases for it as well.
您可以通过文本或图形捕获用例。文本用例易于生成,这是一个明显的优势。不幸的是,使用文本描述用例是一种较差的用例描述方式,因为用例可能过于复杂,无法以高保真度在文本中捕获。文本用例的真正问题是几乎没有人愿意阅读哪怕是简单的文本,这是有原因的。阅读对于人类大脑来说是一种人为的活动,因为大脑无法通过文本轻松吸收和处理复杂的想法。人类阅读已有 5000 年的历史——从进化的角度来说,阅读时间还不足以让大脑跟上(不过,感谢您为本书付出的努力)。
You can capture use cases either textually or graphically. Textual use cases are easy to produce, which is a distinct advantage. Unfortunately, using text for use cases is an inferior way of describing use cases because the use cases may be too complex to capture with high fidelity in text. The real problem with textual use cases is that hardly anyone bothers to read even simple text, and for a good reason. Reading is an artificial activity for the human brain, because the brain is not wired to easily absorb and process complex ideas via text. Mankind has been reading for 5000 years—not long enough for the brain to catch up, evolutionarily speaking (thank you for making the effort with this book, though).
捕捉用例的最佳方式是使用图表(图 3-1)。人类处理图像的速度非常快,因为人类大脑的近一半是一个巨大的视频处理单元。图表允许您利用这个处理器将想法传达给您的受众。
The best way of capturing a use case is graphically, with a diagram (Figure 3-1). Humans perform image processing astonishingly quickly, because almost half the human brain is a massive video processing unit. Diagrams allow you to take advantage of this processor to communicate ideas to your audience.
图 3-1用例图
Figure 3-1 A use case diagram
但是,图形化用例的制作非常耗费人力,尤其是大量用例时。许多用例可能足够简单,无需图表即可理解。例如,图 3-1中的用例图可以用文本同样好地表示。我的经验法则:嵌套的“if”的存在告诉您应该绘制用例。没有读者能够解析包含嵌套“if”的句子。相反,读者可能会不断重读用例,或者更有可能拿起笔和纸,尝试自己想象用例。通过这样做,读者正在解释行为 - 这也增加了误解的可能性。当读者在文本用例的一侧涂鸦时,您知道您应该首先提供可视化效果。图表还允许读者轻松理解复杂用例中大量嵌套的“if”。
Graphical use cases, however, can be very labor-intensive to produce, especially in large numbers. Many use cases may be simple enough to understand without a diagram. For example, the use case diagram in Figure 3-1 can be represented in text equally well. My rule of thumb: The presence of a nested “if” tells you that you should to draw the use case. No reader can parse a sentence containing a nested “if.” Instead, readers will likely continually reread the use case or, more likely, pick up a pen and paper and try to visualize the use case themselves. By doing so, readers are interpreting the behavior—which also raises the possibility of misinterpretation. When readers are scribbling on the side of your textual use case, you know you should have provided the visualization in the first place. Diagrams also allow readers to easily follow a larger number of nested “if”s in a complex use case.
方法更喜欢使用活动图1来图形化表示用例,主要是因为活动图可以捕获行为的时间关键方面,这是流程图和其他图表无法做到的。您无法在流程图中表示并行执行、阻塞或等待某个事件发生。相比之下,活动图包含并发概念。例如,在图 3-2中,您可以直观地看到并行执行的处理是对事件的响应,甚至无需查看图表的符号指南。还请注意,遵循嵌套条件是多么容易。
The Method prefers activity diagrams1 for graphical representation of use cases, primarily because activity diagrams can capture time-critical aspects of behavior, something that flowcharts and other diagrams are incapable of doing. You cannot represent parallel execution, blocking, or waiting for some event to take place in a flowchart. Activity diagrams, by contrast, incorporate a notion of concurrency. For example, in Figure 3-2, you intuitively see the handling of parallel execution as a response to the event without even seeing a notation guide for the diagram. Note also how easy it is to follow the nested condition.
图 3-2活动图
Figure 3-2 An activity diagram
软件系统通常采用分层设计,而该方法在很大程度上依赖于分层。分层允许您分层封装。每一层都封装了来自上层的波动性以及来自下层的波动性。层内的服务封装了彼此之间的波动性,如图 3-3所示。
Software systems are typically designed in layers, and The Method relies heavily on layers. Layers allow you to layer encapsulation. Each layer encapsulates its own volatilities from the layers above and the volatilities in the layers below. Services inside the layers encapsulate volatility from each other, as shown in Figure 3-3.
图 3-3服务和层次
Figure 3-3 Services and layers
即使是简单的系统也应该分层设计以获得封装的好处。理论上,层数越多,封装效果越好。实际系统只有少数几层,最后是实际物理资源层,例如数据存储或消息队列。
Even simple systems should be designed in layers to gain the benefit of encapsulation. In theory, the more layers, the better the encapsulation. Practical systems will have only a handful of layers, terminating with a layer of actual physical resources such as a data storage or a message queue.
跨层的首选方式是调用服务。虽然您当然可以从方法的结构和基于波动性的分解中获益,即使使用常规类,但依赖服务也具有明显的优势。使用哪种技术和平台来实现服务是次要问题。当您使用服务时(只要您选择的技术允许),您会立即获得以下好处:
The preferred way of crossing layers is by calling services. While you certainly can benefit from the structure of The Method and volatility-based decomposition even with regular classes, relying on services provides distinct advantages. Which technology and platform you use to implement your services is a secondary concern. When you do use services (as long as the technology you chose allows), you immediately gain the following benefits:
可扩展性。服务可以以多种方式实例化,包括按每次调用实例化。这允许大量客户端,而无需对后端资源施加成比例的负载,因为您只需要与正在进行的调用数量相同的服务实例。
Scalability. Services can be instantiated in a variety of ways, including on a per-call basis. This allows for a very large number of clients without placing a proportional load on the back-end resources, as you need only as many service instances as there are calls in progress.
安全性。所有面向服务的平台都将安全性视为头等大事。因此,它们会对所有调用进行身份验证和授权 — 不仅是从客户端应用程序到服务的调用,还包括服务之间的调用。您甚至可以使用某种身份传播机制来支持信任链模式。
Security. All service-oriented platforms treat security as a first-class aspect. Thus, they authenticate and authorize all calls—not just those from the client application to the services, but also those between services. You can even use some identity propagation mechanism to support a chain-of-trust pattern.
吞吐量和可用性。服务可以通过队列接受调用,只需将超额负载排队即可处理大量消息。排队调用还可以提高可用性,因为您可以让多个服务实例处理同一个传入队列。
Throughput and availability. Services can accept calls over queues, allowing you to handle a very large volume of messages by simply queuing up the excess load. Queued calls also enable availability, because you can have multiple service instances process the same incoming queue.
响应能力。服务可以将调用限制到缓冲区以避免系统超负荷。
Responsiveness. Services can throttle the calls into a buffer to avoid maxing out the system.
可靠性。客户端和服务可以使用一些可靠的消息传递协议来保证交付、处理网络连接问题,甚至排序调用。
Reliability. Clients and services can use some reliable messaging protocol to guarantee delivery, handle network connectivity issues, and even order the calls.
一致性。所有服务都可以参与同一个工作单元,无论是在事务中(当由基础设施支持时),还是在最终一致的协调业务事务中。调用链上的任何错误都会导致整个交互中止,而无需根据错误的性质和恢复逻辑将服务耦合在一起。
Consistency. The services can all participate in the same unit of work, either in a transaction (when supported by the infrastructure) or in a coordinated business transaction that is eventually consistent. Any error along the call chain causes the entire interaction to abort, without coupling the services along the nature of the error and the recovery logic.
同步。即使客户端使用多个并发线程,对服务的调用也可以自动同步。
Synchronization. The calls to the service can be automatically synchronized even if the clients use multiple concurrent threads.
该方法要求系统架构中有四层。这些层符合一些经典的软件工程实践。但是,使用波动性来驱动这些层内的分解对你来说可能很陌生。图 3-4描述了该方法中的典型层。
The Method calls for four layers in the system architecture. These layers conform to some classic software engineering practices. However, using volatility to drive the decomposition inside these layers may be new to you. Figure 3-4 depicts the typical layers in The Method.
图 3-4方法中的典型层
Figure 3-4 Typical layers in The Method
架构中的顶层是客户端层,也称为表示层。我觉得“表示”一词有些误导。“表示”意味着向人类用户呈现一些信息,好像这就是对顶层的全部期望。客户端层中的元素很可能是最终用户应用程序,但也可能是与您的系统交互的其他系统。这是一个重要的区别:通过将其称为客户端层,您可以平等对待所有可能的客户端,以相同的方式对待它们。所有客户端(无论是最终用户应用程序还是其他系统)都使用相同的系统入口点(任何优秀设计的一个重要方面),并遵守相同的访问安全性、数据类型和其他接口要求。这反过来又促进了重用和可扩展性,并允许更容易的维护,因为一个入口点的修复会以相同的方式影响所有客户端。
The top layer in architecture is the client layer, also known as the presentation layer. I find the term “presentation” to be somewhat misleading. “Presentation” implies some information is being presented to human users, as if that is all that is expected from the top layer. The elements in the client layer may very well be end-user applications, but they can also be other systems interacting with your system. This is an important distinction: By calling this the client layer, you equalize all possible clients, treating them in the same way. All Clients (whether end-user applications or other systems) use the same entry points to the system (an important aspect of any good design) and are subject to the same access security, data types, and other interfacing requirements. This, in turn, promotes reuse and extensibility and allows for easier maintenance, as a fix at one entry point affects all Clients the same way.
让客户端使用服务可以更好地将表示与业务逻辑分开。大多数面向服务的技术对于它们允许通过端点传输的数据类型非常严格。这限制了将客户端与服务耦合的能力,统一对待所有客户端,并且至少在理论上更容易添加不同类型的客户端。
Having the Clients consume services caters to better separation of presentation from business logic. Most service-oriented technologies are very strict about the types of data they allow over the endpoints. This limits the ability to couple the Clients to the services, treats all Clients uniformly, and makes adding different types of Clients, at least in theory, easier to accomplish.
客户端层还封装了客户端中的潜在波动性。您的系统现在和将来在波动性方面可能拥有不同的客户端,例如桌面应用程序、Web 门户、移动应用程序、全息图和增强现实、API、管理应用程序等。各种客户端应用程序将使用不同的技术,以不同的方式部署,拥有自己的版本和生命周期,并且可能由不同的团队开发。事实上,客户端层通常是典型软件系统中最不稳定的部分。但是,所有这些波动性都封装在客户端层的各个块中,并且一个组件中的更改不会影响另一个客户端组件。
The client layer also encapsulates the potential volatility in Clients. Your system now and in the future across the axes of volatility may have different Clients such as desktop applications, web portals, mobile apps, holograms and augmented reality, APIs, administration applications, and so on. The various Client applications will use different technologies, be deployed differently, have their own versions and life cycles, and may be developed by different teams. Indeed, the client layer is often the most volatile part of a typical software system. However, all of that volatility is encapsulated in the various blocks of the client layer, and changes in one component do not affect another Client component.
业务逻辑层封装了系统业务逻辑中的易变性。此层实现系统所需的行为,如前所述,这些行为最好在用例中表达。如果用例是静态的,那么就不会有需要业务逻辑层。然而,用例在客户和时间上都是不稳定的。由于用例包含系统中的一系列活动,因此特定用例只能以两种方式更改:要么序列本身发生变化,要么用例内的活动发生变化。例如,考虑图 3-1中的用例与图 3-5中的用例。
The business logic layer encapsulates the volatility in the system’s business logic. This layer implements the system’s required behavior, which, as mentioned previously, is best expressed in use cases. If the use cases were static, there would be no need for a business logic layer. Use cases, however, are volatile, across both customers and time. Since a use case contains a sequence of activities in the system, a particular use case can change in only two ways: Either the sequence itself changes or the activities within the use case change. For example, consider the use case in Figure 3-1 versus the use cases in Figure 3-5.
图 3-5序列波动率
Figure 3-5 Sequence volatility
图 3-1和3-5中的所有四个用例都使用相同的活动A、B和C,但每个序列都是唯一的。这里的关键观察是工作流的顺序或编排可以独立于活动而改变。
All four use cases in Figures 3-1 and 3-5 use the same activities A, B, and C, but each sequence is unique. The key observation here is that the sequence or the orchestration of the workflow can change independently from the activities.
现在考虑图 3-6中的两个活动图。它们都调用完全相同的序列,但它们使用不同的活动。活动可以独立于序列而改变。
Now consider the two activity diagrams in Figure 3-6. Both call for exactly the same sequence, but they use different activities. The activities can change independently from the sequence.
图 3-6活动波动性
Figure 3-6 Activity volatility
序列和活动都是不稳定的,在方法中,这些不稳定被封装在称为管理器和引擎的特定组件中。管理器组件封装序列中的不稳定,而引擎组件封装活动中的不稳定。在第 2 章的股票交易分解示例中,Trade Workflow组件(见图2-13)是管理器,而Feed Transformation组件是引擎。
Both the sequence and the activities are volatile, and in The Method these volatilities are encapsulated in specific components called Managers and Engines. Manager components encapsulate the volatility in the sequence, whereas Engine components encapsulate the volatility in the activity. In Chapter 2, in the stock trading decomposition example, the Trade Workflow component (see Figure 2-13) is a Manager, while the Feed Transformation component is an Engine.
由于用例通常是相关的,管理器倾向于封装一系列逻辑相关的用例,例如特定子系统中的用例。例如,对于第 2 章的股票交易系统,Analysis Workflow有一个单独的管理器 来自Trade Workflow,每个管理器都有一组相关的用例要执行。引擎的范围更受限制,并封装了业务规则和活动。
Since use cases are often related, Managers tend to encapsulate a family of logically related use cases, such as those in a particular subsystem. For example, with the stock trading system of Chapter 2, Analysis Workflow is a separate Manager from Trade Workflow, and each Manager has its own related set of use cases to execute. Engines have more restricted scope and encapsulate business rules and activities.
由于序列中可以有很大的波动性,而序列中的活动却没有任何波动性(见图3-5),因此管理器可以使用零个或多个引擎。引擎可以在管理器之间共享,因为您可以代表一个管理器在一个用例中执行活动,然后在单独的用例中为另一个管理器执行相同的活动。您应该在设计引擎时考虑重用。但是,如果两个管理器使用两个不同的引擎来执行相同的活动,那么您要么需要进行功能分解,要么会忽略一些活动的波动性。您将在本章后面看到有关管理器和引擎的更多信息。
Since you can have great volatility in the sequence without any volatility in the activities of the sequence (see Figure 3-5), Managers may use zero or more Engines. Engines may be shared between Managers because you could perform an activity in one use case on behalf of one Manager and then perform the same activity for another Manager in a separate use case. You should design Engines with reuse in mind. However, if two Managers use two different Engines to perform the same activity, you either have functional decomposition on your hands or you have missed some activity volatility. You will see more on Managers and Engines later in this chapter.
资源访问层恰如其名,它封装了访问资源的易变性,此层中的组件称为ResourceAccess。例如,如果资源是数据库,则有数十种方法可用于访问数据库,并且没有一种方法在各个方面都优于所有其他方法。随着时间的推移,您可能希望更改访问数据库的方式,因此应该封装所涉及的更改或易变性。请注意,您不应简单地封装访问资源的易变性;也就是说,您还必须封装资源本身的易变性,例如本地数据库与基于云的数据库,或内存存储与持久存储。资源更改也必然会更改ResourceAccess。
The aptly named resource access layer encapsulates the volatility in accessing a resource, and the components in this layer are called ResourceAccess. For example, if the resource is a database, literally dozens of methods are available for accessing a database, and no single method is superior to all other methods in every respect. Over time, you may want to change the way you access the database, so that change or the volatility involved should be encapsulated. Note that you should not simply encapsulate the volatility in accessing the resource; that is, you must also encapsulate the volatility in the resource itself, such as a local database versus a cloud-based database, or in-memory storage versus durable storage. Resource changes invariably change ResourceAccess as well.
虽然资源访问层背后的动机显而易见,并且许多系统都包含某种形式的访问层,但大多数此类层最终都会通过创建类似于 I/O 操作或 CRUD 的ResourceAccess契约而暴露底层的波动性。例如,如果您的ResourceAccess服务契约包含Select()、Insert()和等操作Delete(),则底层资源很可能是数据库。如果您稍后将数据库更改为基于云的分布式哈希表,那么该数据库访问类契约将变得毫无用处,并且需要新的契约。更改契约会影响使用ResourceAccess组件的每个引擎和管理器。同样,您必须避免、、、和等操作,因为这些操作会将底层资源暴露为文件。设计良好的ResourceAccess组件会在其契约中公开围绕资源的原子业务动词。Open()Close()Seek()Read()Write()
While the motivation behind the resource access layer is readily evident and many systems incorporate some form of an access layer, most such layers end up exposing the underlying volatility by creating a ResourceAccess contract that resembles I/O operations or that is CRUD-like. For example, if your ResourceAccess service contract contains operations such as Select(), Insert(), and Delete(), the underlying resource is most likely a database. If you later change the database to a distributed cloud-based hash table, that database-access-like contract will become useless, and a new contract is required. Changing the contract affects every Engine and Manager that has used the ResourceAccess component. Similarly, you must avoid operations such as Open(), Close(), Seek(), Read(), and Write() that betray the underlying resource as being a file. A well-designed ResourceAccess component exposes in its contract the atomic business verbs around a resource.
系统中的Manager服务执行一些业务活动序列。这些活动又常常包含一组更细粒度的活动。但是,在某些时候,您将拥有如此低级别的活动,以至于它们无法通过系统中的任何其他活动来表达。该方法将这些不可分割的活动称为原子业务动词。例如,在银行,一个典型的用例是在两个账户之间转账。转账是通过贷记一个账户并借记另一个账户来完成的。在银行,从业务角度来看,贷记和借记是原子操作。请注意,从系统角度来看,原子业务动词可能需要几个步骤才能实现。原子性是面向业务的,而不是面向系统的。
The Manager services in the system execute some sequence of business activities. These activities, in turn, often comprise an even more granular set of activities. However, at some point you will have such low-level activities that they cannot be expressed by any other activity in the system. The Method refers to these indivisible activities as atomic business verbs. For example, in a bank, a classic use case would be to transfer money between two accounts. The transfer is done by crediting one account and debiting another. In a bank, credit and debit are atomic operations from the business’s perspective. Note that an atomic business verb may require several steps from the system perspective to implement. The atomicity is geared toward the business, not the system.
原子业务动词实际上是不可变的,因为它们与业务性质密切相关,而业务性质几乎从未改变,如第 2 章所述。例如,自美第奇家族时代以来,银行就一直进行信贷和借记业务。在内部,ResourceAccess服务应将其契约中的这些动词转换为针对资源的 CRUD 或 I/O。通过仅公开稳定的原子业务动词,当ResourceAccess服务发生更改时,只有访问组件的内部会发生变化,而不是其上的整个系统。
Atomic business verbs are practically immutable because they relate strongly to the nature of the business, which, as discussed in Chapter 2, hardly ever changes. For example, since the time of the Medici, banks have performed credit and debit operations. Internally, the ResourceAccess service should convert these verbs from its contract into CRUDs or I/O against the resources. By exposing only the stable atomic business verbs, when the ResourceAccess service changes, only the internals of the access component change, rather than the whole system atop it.
ResourceAccess服务可以在Manager和Engine之间共享。您应该明确设计ResourceAccess组件时考虑到这种重用。如果两个Manager或两个Engine在访问同一资源时不能使用相同的ResourceAccess服务,或者需要进行某些特定访问,则可能是您没有封装某些访问波动性,或者没有正确隔离原子业务动词。
ResourceAccess services can be shared between Managers and Engines. You should explicitly design ResourceAccess components with this reuse in mind. If two Managers or two Engines cannot use the same ResourceAccess service when accessing the same resource or have some need for specific access, perhaps you did not encapsulate some access volatility or did not isolate the atomic business verbs correctly.
资源层包含系统所依赖的实际物理资源,例如数据库、文件系统、缓存或消息队列。在方法中,资源可以是系统内部的,也可以是系统外部的。通常,资源本身就是一个完整的系统,但对于您的系统来说,它只是一个资源。
The resource layer contains the actual physical Resources on which the system relies, such as a database, file system, a cache, or a message queue. In The Method, the Resource can be internal to the system or outside the system. Often, the Resource is a whole system in its own right, but to your system it appears as just a Resource.
图 3-4右侧的实用程序垂直栏包含实用程序服务。这些服务是几乎所有系统运行所需的某种形式的通用基础设施。实用程序可能包括、、、、、等等。您将在本章后面看到,与其他组件相比,实用程序需要不同的规则。SecurityLogging, DiagnosticsInstrumentationPub/SubMessage BusHosting
The utilities vertical bar on the right side of Figure 3-4 contains Utility services. These services are some form of common infrastructure that nearly all systems require to operate. Utilities may include Security, Logging, Diagnostics, Instrumentation, Pub/Sub, Message Bus, Hosting, and more. You will see later in this chapter that Utilities require different rules compared with the other components.
正如每个好主意一样,方法也可能被滥用。如果没有实践和批判性思维,就有可能只使用方法分类法的名义,而仍然会产生功能分解。通过遵循本节提供的简单指南,您可以在很大程度上减轻这种风险。
As is true for every good idea, The Method can be abused. Without practice and critical thinking, it is possible to use The Method taxonomy in name only and still produce a functional decomposition. You can mitigate this risk to a great extent by adhering to the simple guidelines provided in this section.
利用指南的另一个用途是启动设计。在几乎每个设计工作的开始阶段,大多数人都会感到困惑,不知道从哪里开始。掌握一些关键的观察结果非常有帮助,这些观察结果既可以启动一项初露头角的设计工作,也可以验证一项初露头角的设计工作。
Another use for leveraging guidelines is initiating design. At the beginning of nearly every design effort, most people are stumped, unsure where to even start. It is very helpful to be armed with some key observations that can both initiate and validate a budding design effort.
服务名称和图表对于向他人传达您的设计非常重要。描述性名称在业务和资源访问层中非常重要,因此The Method建议使用以下约定来命名它们:
Service names as well as diagrams are important in communicating your design to others. Descriptive names are so important within the business and resource access layers that The Method recommends the following conventions for naming them:
服务名称必须是用 Pascal 大小写书写的两部分复合词。
Names of services must be two-part compound words written in Pascal case.
名称的后缀始终是服务的类型 — 例如Manager、Engine或Access(对于ResourceAccess)。
The suffix of the name is always the service’s type—for example, Manager, Engine, or Access (for ResourceAccess).
前缀根据服务类型的不同而不同。
– 对于经理,前缀应该是与用例中封装的波动性相关的名词。
– 对于引擎,前缀应该是描述封装活动的名词。
– 对于ResourceAccess,前缀应该是与Resource相关的名词,例如服务向消费用例提供的数据。
The prefix varies with the type of service.
– For Managers, the prefix should be a noun associated with the encapsulated volatility in the use cases.
– For Engines, the prefix should be a noun describing the encapsulated activity.
– For ResourceAccess, the prefix should be a noun associated with the Resource, such as data that the service provides to the consuming use cases.
动名词(动名词是在动词上添加“ing”而形成的名词)应仅在引擎中用作前缀。在业务层或访问层的其他位置使用动名词通常表示功能分解。
Gerunds (a gerund is a noun created by tacking “ing” onto a verb) should be used as a prefix only in with Engines. The use of gerunds elsewhere in the business or access layers usually signals functional decomposition.
原子业务动词不应用作服务名称的前缀。这些动词应仅限于与资源访问层交互的契约中的操作名称。
Atomic business verbs should not be used in a prefix for a service name. These verbs should be confined to operation names in contracts interfacing with the resource access layer.
例如,在银行设计中,AccountManager和AccountAccess是可接受的服务名称。但是,名称BillingManager和BillingAccess功能分解的味道,因为动名词前缀传达的是“做”的概念,而不是编排或访问波动性。CalculatingEngine是一个很好的候选名称,因为引擎“做”诸如聚合、调整、制定战略、验证、评级、计算、转换、生成、调节、翻译、定位和搜索之类的事情。AccountEngine相比之下,名称没有任何活动波动性的指标,并且再次带有强烈的功能或域分解的味道。
As examples, in a bank design, AccountManager and AccountAccess are acceptable service names. However, the names BillingManager and BillingAccess smell of functional decomposition because the gerund prefixes convey a concept of “doing” rather than of an orchestration or access volatility. CalculatingEngine is a good candidate name because Engines “do” things such as aggregate, adapt, strategize, validate, rate, calculate, transform, generate, regulate, translate, locate, and search. The name AccountEngine, by contrast, is devoid of any indicator of the activity volatility and again carries a strong smell of functional or domain decomposition.
架构中的服务和资源层松散地对应于“谁”、“什么”、“如何”和“哪里”四个英语问题。“谁”与系统交互在客户端中,“什么”对系统有要求在管理器中,“如何”系统执行业务活动在引擎中,“如何”系统访问资源在资源访问中,“哪里”系统状态在资源中(见图3-7)。
The layers of services and resources in the architecture loosely correspond to the four English questions of “who,” “what,” “how,” and “where.” “Who” interacts with the system is in the Clients, “what” is required of the system is in Managers, “how” the system performs business activities is in Engines, “how” the system accesses Resources is in ResourceAccess, and “where” the system state is in Resources (see Figure 3-7).
图 3-7问题和层次
Figure 3-7 Questions and layers
这四个问题大致对应各个层级,因为波动性高于一切。例如,如果“如何”的波动性很小或没有波动性,那么管理者可以同时执行“什么”和“如何”。
The four questions loosely correspond to the layers because volatility trumps everything. For example, if there is little or no volatility in the “how,” the Managers can perform both “what” and “how.”
提出并回答这四个问题对于设计工作的启动和验证都很有用。如果您只有一张白纸,不知道从哪里开始,您可以通过回答这四个问题来启动设计工作。列出所有“谁”,并将它们作为客户的候选放入一个箱子中。列出所有“什么”,并将它们作为经理的候选放入另一个箱子中,依此类推。结果不会是完美的——例如,所有“什么”组件不一定都会合并成单个经理——但这是一个开始。
Asking and answering the four questions is useful at both ends of the design effort, for initiation and for validation. If all you have is a clean slate and no clear idea where to start, you can initiate the design effort by answering the four questions. Make a list of all the “who” and put them in one bin as candidates for Clients. Make a list of all the “what” and put them in another bin as candidates for Managers, and so on. The result will not be perfect—for example, all “what” components will not necessarily coalesce into individual Managers—but it is a start.
完成设计后,请退一步检查设计。您的所有客户都是“谁”,其中没有“什么”的痕迹吗?所有经理都是“什么”,其中没有一丝“谁”和“哪里”吗?同样,问题到层的映射并不完美。在某些情况下,您可能会在问题之间出现交叉。但是,如果您确信波动性的封装是合理的,则没有理由进一步怀疑该选择。如果您不相信,这些问题可能表明存在危险信号并需要进行调查。
Once you complete your design, take a step back and examine the design. Are all your Clients “who,” with no trace of “what” in them? Are all the Managers “what,” without a smidgen of “who” and “where” in them? Again, the mapping of questions to layers will not be perfect. In some cases, you could have crossover between questions. However, if you are convinced the encapsulation of the volatility is justified, there is no reason to doubt that choice further. If you are unconvinced, the questions could indicate a red flag and a decomposition to investigate.
这四个问题与之前关于命名服务的指导原则完美契合。如果Manager前缀描述了封装的易变性,那么用“什么”而不是动词“如何”来谈论它们会更自然。如果Engine前缀是描述封装活动的动名词,那么用“如何”而不是“什么”或“哪里”来谈论它们会更自然。出于类似的原因,ResourceAccess封装了“如何”来访问其背后的资源。
The four questions tie in nicely with the previous guideline on naming the services. If the Manager prefixes describe the encapsulated volatilities, it is more natural to talk about them in terms of “what” as opposed to the verb-like “how.” If the Engine prefixes are gerunds describing the encapsulated activities, it is more natural to talk about them in terms of “how” as opposed to “what” or “where.” For similar reasons, ResourceAccess encapsulates “how” to access the Resources that lie behind it.
大多数设计最终所包含的引擎数量都比您最初想象的要少。首先,要使引擎存在,必须有一些基本的操作波动性需要您封装 — 即未知数量的做事方式。这种波动性并不常见。如果您的设计包含大量引擎,您可能无意中进行了功能分解。
Most designs end up with fewer Engines than you might initially imagine. First, for an Engine to exist, there must be some fundamental operational volatility that you should encapsulate—that is, an unknown number of ways of doing something. Such volatilities are uncommon. If your design contains a large number of Engines, you may have inadvertently done a functional decomposition.
在 IDesign 的工作中,我们观察到许多系统中管理器和引擎往往保持黄金比例。如果您的系统只有一个管理器(不是神服务),那么您可能没有引擎,或者最多只有一个引擎。想想看:如果系统非常简单,一个像样的管理器就足够了,那么活动波动性高但用例类型不多的可能性有多大?
In our work at IDesign, we have observed across numerous systems that Managers and Engines tend to maintain a golden ratio. If your system has only one Manager (not a god service), you may have no Engines, or at most one Engine. Think about it: If the system is so simple that one decent Manager suffices, how likely is it to have high volatility in the activities but not that many types of use cases?
一般来说,如果您的系统有两个Manager,则很可能需要一个Engine。如果您的系统有三个Manager,则两个Engine可能是最佳数量。如果您的系统有五个Manager,则可能需要多达三个Engine。如果您的系统有八个Manager,则您已经无法进行良好的设计:大量的Manager强烈表明您进行了功能或领域分解。大多数系统永远不会有那么多Manager,因为它们不会有许多真正独立的、具有自身波动性的用例系列。此外,一个Manager可以支持多个用例系列,通常表示为不同的服务契约或服务的各个方面。这可以进一步减少系统中的Manager数量。
Generally, if your system has two Managers, you will likely need one Engine. If your system has three Managers, two Engines is likely the best number. If your system has five Managers, you may need as many as three Engines. If your system has eight Managers, then you have already failed to produce a good design: The large number of Managers strongly indicates you have done a functional or domain decomposition. Most systems will never have that many Managers because they will not have many truly independent families of use cases with their own volatility. In addition, a Manager can support more than one family of use cases, often expressed as different service contracts, or facets of the service. This can further reduce the number of Managers in a system.
有了方法论的建议,您可以对精心设计的系统所具有的品质进行一些全面的观察。偏离这些观察结果可能表明功能分解仍在继续,或者至少分解不成熟,您只捕获了一些明显的波动,而忽略了其他波动。
Armed with the recommendations of The Method, you can make some sweeping observations about the qualities you expect to see in a well-designed system. Deviating from these observations may indicate a lingering functional decomposition or at least an unripe decomposition in which you have encapsulated few of the glaring volatilities but have missed others.
在设计良好的系统中,波动性应该从上到下逐层降低。客户端的波动性非常大。一些客户可能希望客户端以这种方式运行,其他客户可能希望客户端以那种方式运行,而还有一些客户可能希望在不同的设备上使用相同的东西。这种自然高水平的波动性与底层系统所需的行为无关。管理器会发生变化,但变化幅度不如其客户端大。当用例(系统所需的行为)发生变化时,管理器也会发生变化。引擎的波动性比管理器小。要使引擎发生变化,您的企业必须改变其执行某些活动的方式,这比改变活动顺序更为少见。ResourceAccess服务的波动性甚至比引擎小。您多久更改一次访问资源的方式,或者就此而言,更改资源?您可以更改活动及其顺序,而无需更改原子业务动词到资源的映射。资源是最不不稳定的组件,与系统的其余部分相比,其变化速度非常缓慢。
In a well-designed system, volatility should decrease top-down across the layers. Clients are very volatile. Some customers may want the Clients this way, other customers would want the Clients that way, and others may want the same thing but on a different device. This naturally high level of volatility has nothing to do with the required behavior of the underlying system. Managers do change, but not as much as their Clients. Managers change when the use cases—the required behavior of the system—change. Engines are less volatile than Managers. For an Engine to change, your business must change the way it is performing some activity, which is more uncommon than changing the sequencing of activities. ResourceAccess services are even less volatile than Engines. How often do you change the way you access a Resource or, for that matter, change the Resource? You can change activities and their sequence without ever changing the mapping of the atomic business verbs to Resources. Resources are the least volatile components, changing at a glacial pace compared with the rest of the system.
波动性逐层降低的设计非常有价值。较低层的组件有更多依赖项。如果您最依赖的组件也是最不稳定的,那么您的系统就会崩溃。
A design in which the volatility decreases down the layers is extremely valuable. The components in the lower layers have more items that depend on them. If the components you depend upon the most are also the most volatile, your system will implode.
重用性与波动性不同,它应当沿着层次向下增加。客户端几乎从不重用。客户端应用程序通常是为特定类型的平台和市场开发的,并且无法重用。例如,Web 门户中的代码无法轻易地在桌面应用程序中重用,而桌面应用程序无法在移动设备中重用。管理器是可重用的,因为您可以在多个客户端中使用相同的管理器和用例。引擎比管理器更具可重用性,因为同一个引擎可以由多个管理器在不同的用例中调用来执行相同的活动。ResourceAccess组件的可重用性很高,因为它们可以由引擎和管理器调用。资源是任何设计良好的系统中可重用性最高的元素。在新的设计中重用现有资源的能力通常是企业批准新系统实施的关键因素。
Reuse, unlike volatility, should increase going down the layers. Clients are hardly ever reusable. A Client application is typically developed for a particular type of platform and market and cannot be reused. For example, the code in a web portal cannot easily be reused in a desktop application, and the desktop application cannot be reused in a mobile device. Managers are reusable because you can use the same Manager and use cases from multiple Clients. Engines are even more reusable than Managers because the same Engine could be called by multiple Managers, in different use cases, to perform the same activity. ResourceAccess components are very reusable because they can be called by Engines and Managers. The Resources are the most reusable element in any well-designed systems. The ability to reuse existing Resources in a new design is often a key factor in business approval of a new system’s implementation.
管理器可以分为三类:昂贵的、可消耗的和几乎可消耗的。您可以通过您被要求更改管理器时的反应来区分管理器属于哪一类。如果您的反应是抵制变更、害怕变更成本、反对变更等等,那么该管理器显然是昂贵的且不可消耗的。昂贵的管理器表明管理器太大,可能是由于功能分解造成的。如果您对变更请求的反应只是耸耸肩,不以为意,那么该管理器就是传递型和可消耗型。可消耗管理器始终是设计缺陷和架构的扭曲。它们通常只是为了满足设计准则而存在,而不需要封装用例波动性。
Managers can fall into one of three categories: expensive, expendable, and almost expendable. You can distinguish the category to which a Manager belongs by the way you respond when you are asked to change it. If your response is to fight the change, to fear its cost, to argue against the change, and so forth, then the Manager was clearly expensive and not expendable. An expensive Manager indicates that the Manager is too big, likely due to functional decomposition. If your response to the change request is just to shrug it off, thinking little of it, the Manager is pass-through and expendable. Expendable Managers are always a design flaw and a distortion of the architecture. They often exist only to satisfy the design guidelines without any real need for encapsulating use case volatility.
但是,如果您对所提议的Manager更改的反应是深思熟虑的,导致您仔细考虑使Manager适应用例更改的具体方法(甚至可能快速估计所需的工作量),那么Manager几乎是可有可无的。如果Manager只是协调Engines和ResourceAccess,封装序列波动性,您拥有一个出色的Manager服务,尽管它几乎是可消耗的。设计良好的Manager服务应该几乎是可消耗的。
If, however, your response to the proposed Manager change is contemplative, causing you to think through the specific ways of adapting the Manager to the change in the use case (perhaps even quickly estimating the amount of work required), the Manager is almost expendable. If the Manager merely orchestrates the Engines and the ResourceAccess, encapsulating the sequence volatility, you have a great Manager service, albeit an almost expendable one. A well-designed Manager service should be almost expendable.
Managers 、Engines和ResourceAccess本身都是服务。 Manager 、 Engines 和 ResourceAccess 之间的紧密交互可能构成对外部消费者的单一逻辑服务。 您可以将这样一组交互服务视为逻辑子系统。 您可以将它们组合在一起作为系统的垂直切片(图 3-8 ),其中每个垂直切片实现一组相应的用例。
The Managers, Engines, and ResourceAccess are all services on their own right. A cohesive interaction between the Manager, Engines, and ResourceAccess may constitute a single logical service to external consumers. You can view such a set of interacting services as a logical subsystem. You group these together as a vertical slice of your system (Figure 3-8), where each vertical slice implements a corresponding set of use cases.
图 3-8子系统作为垂直切片
Figure 3-8 Subsystems as vertical slices
避免将系统过度划分为子系统。大多数系统应该只有少数几个子系统。同样,您应将每个子系统的管理器数量限制为三个。这也允许您稍微增加系统中所有子系统的管理器总数。
Avoid over-partitioning your system into subsystems. Most systems should have only a handful of subsystems. Likewise, you should limit the number of Managers per subsystem to three. This also allows you to somewhat increase the total number of Managers in the system across all subsystems.
如果系统相对简单且规模较小,则系统的业务价值(即用例的执行)可能需要架构的所有组件。对于这样的系统,只发布Engines或ResourceAccess组件是没有意义的。
If the system is relatively simple and small, the business value of the system—that is, the execution of the use cases—will likely require all components of the architecture. For such systems, there is no sense in releasing, say, just the Engines or the ResourceAccess components.
对于大型系统,某些子系统(如图3-8中的垂直切片)可能独立存在并提供直接的商业价值。此类系统的构建成本更高,完成时间更长。在这种情况下,分阶段开发和交付系统(一次一个切片)是合理的,而不是在项目结束时提供单一版本。此外,客户将能够就增量版本向开发人员提供早期反馈,而不是在最后才提供完整的系统。
With a large system, it could be that certain subsystems (such as the vertical slices of Figure 3-8) can stand alone and provide direct business value. Such systems will be more expensive to build and take longer to complete. In such cases it makes sense to develop and deliver the system in stages, one slice at a time, as opposed to providing a single release at the end of the project. Moreover, the customer will be able to provide early feedback to the developers on the incremental releases as opposed to only the complete system at the end.
无论是小型还是大型系统,正确的构建方法是另一条普遍原则:
With both small and large systems, the right approach to construction is another universal principle:
迭代设计,逐步构建。
Design iteratively, build incrementally.
无论领域和行业如何,这一原则都是正确的。例如,假设您希望在购买的一块土地上建造房屋。即使是最好的建筑师也无法在一次会议中为您的房屋设计出设计方案。在定义问题并讨论资金、居住者、风格、时间和风险等约束条件时,会有一些反复。您将从对蓝图的一些粗略修改开始,然后对其进行改进,评估其影响并研究替代方案。经过几次这样的迭代,设计将趋于一致。到了建造房屋的时候,您也会迭代地这样做吗?您会从双人帐篷开始,然后将其扩展到四人帐篷,然后是小棚屋,然后是小房子,最后是更大的房子吗?甚至考虑这种方法都是疯狂的。相反,您可能会挖掘和浇筑地基,然后将墙壁竖到一楼,然后将公用设施连接到结构,然后添加二楼,最后添加屋顶。简而言之,您可以逐步建造一座简单的房子。对于未来的房主来说,仅仅拥有地基或屋顶是没有任何价值的。也就是说,房子就像一个逐步建造的简单软件系统一样,在完成之前没有真正的价值。但是,如果建筑物有多个楼层(或多个翼楼),则可以逐步建造并提供中间价值。您的设计可能允许您一次完成一层(或一次完成一个翼楼),类似于大型软件系统的“一次一片”方法。
This principle is true regardless of domain and industry. For example, suppose you wish to build your house on a plot of land you have purchased. Even the best architect will not be able to produce the design for your house in a single session. There will be some back-and-forth as you define the problem and discuss constraints such as funds, occupants, style, time, and risk. You will start with some rough cuts to the blueprints, refine them, evaluate the implications, and examine alternatives. After several of these iterations, the design will converge. When it is time to build the house, will you do that iteratively, too? Will you start with a two-person tent, grow it out to a four-person tent, then to a small shed, then to a small house, and finally to a bigger house? It would be insane to even contemplate such an approach. Instead, you are likely to dig and cast the foundation, then erect the walls to the first floor, then connect utilities to the structure, then add the second floor, and finally add the roof. In short, you build a simple house incrementally. There is no value for the prospective homeowner in having just the foundations or the roof. That is, the house—like an incrementally built simple software system—has no real value until complete. However, if the building has multiple floors (or multiple wings), it may be possible to build it incrementally and deliver intermediate value. Your design may allow you to complete one floor at a time (or one wing at a time), similar to the “one slice at a time” approach to a large software system.
另一个例子是组装汽车。虽然汽车公司可能有一个设计团队设计汽车,经过多次迭代,但到了制造汽车的时候,制造过程并不是从滑板开始,然后发展成踏板车、自行车、摩托车,最后是汽车。相反,汽车是逐步制造的。首先,工人将底盘焊接在一起,然后用螺栓固定发动机缸体,然后添加座椅、外壳和轮胎。他们给汽车上漆,添加仪表板,最后安装内饰。
Another example is assembling cars. While the car company may have had a team of designers designing a car across multiple iterations, when it is time to build the car, the manufacturing process does not start with a skateboard, grow that to a scooter, then a bicycle, then a motorcycle, and finally a car. Instead, a car is built incrementally. First, the workers weld a chassis together, then they bolt on the engine block, and then they add the seats, the skin, and the tires. They paint the car, add the dashboard, and finally install the upholstery.
有两个原因导致你只能逐步构建,而不能迭代构建。首先,迭代构建非常浪费且困难(将摩托车变成汽车比仅仅制造汽车困难得多)。其次,更重要的是,中间迭代没有任何商业价值。如果客户想要一辆汽车送孩子上学,那么客户要一辆摩托车做什么呢?为什么客户要为此付钱?
There are two reasons why you can build only incrementally, and not iteratively. First, building iteratively is horrendously wasteful and difficult (turning a motorcycle into a car is much more difficult than just building a car). Second, and much more importantly, the intermediate iterations do not have any business value. If the customer wants a car to take the kids to school, what would the customer do with a motorcycle and why should the customer pay for it?
分阶段建造还可以让您适应时间和预算的限制。如果您设计了一栋四层楼的梦想之家,但您只能负担得起一栋单层楼的房子,那么您有两个选择。第一个选择仍然是用单层楼的预算建造一栋四层楼的房子,使用胶合板做所有墙壁,用塑料板做窗户,用水桶做浴室,用泥土做地板,用茅草做屋顶。第二个选择是只好好建造四层楼房子的一楼。当您积累了额外的资金时,您就可以建造二楼和三楼。十年后,当您最终完成建筑时,该建筑仍然与原始建筑相匹配。
Building incrementally also allows you to accommodate constraints on your time and budget. If you design a four-story dream house but you can afford only a single-floor house, you have two options. The first option is still build a four-story house with a single-floor budget by using plywood for all walls, sheet plastic for windows, buckets for a bathroom, dirt for floors, and a thatched roof. The second option is to properly build just the first floor of the four-story house. When you accumulate additional funds, you can then construct the second and third floors. A decade later, when you finally complete the building, that construction still matches the original architecture.
在架构范围内逐步构建的能力取决于架构的恒定性和真实性。使用功能分解,您将面对不断变化的碎片堆。可以公平地假设,那些只知道功能分解的人注定要进行迭代构建。使用基于波动性的分解,您有机会做对。
The ability to build incrementally over time, within the confines of the architecture, is predicated on the architecture remaining constant and true. With functional decomposition, you face ever-shifting piles of debris. It is fair to assume that those who know only functional decomposition are condemned to iterative construction. With volatility-based decomposition, you have a chance of getting it right.
系统的垂直切片还使您能够适应可扩展性。扩展任何系统的正确方法不是打开它并敲打现有组件。如果您已正确设计了可扩展性,则可以基本不改变现有内容并扩展整个系统。继续以房屋为例,如果您希望在将来某个时候为单层房屋添加第二层,那么第一层必须设计为能够承受额外负荷,管道必须以可以扩展到第二层的方式安装,依此类推。通过拆除第一层然后建造新的第一层和第二层来添加第二层称为返工,而不是可扩展性。基于方法的系统的设计面向可扩展性:只需添加更多这些切片或子系统即可。
The vertical slices of the system also enable you to accommodate extensibility. The correct way of extending any system is not by opening it up and hammering on existing components. If you have designed correctly for extensibility, you can mostly leave existing things alone and extend the system as a whole. Continuing the house analogy, if you want to add a second floor to a single-story house at some point in the future, then the first floor must have been designed to carry the additional load, the plumbing must have been done in a way that could be extended to the second floor, and so on. Adding a second floor by destroying the first floor and then building new first and second floors is called rework, not extensibility. The design of a Method-based system is geared toward extensibility: Just add more of these slices or subsystems.
我被誉为微服务的先驱之一。早在 2006 年,我就在演讲和写作中呼吁构建每个类都是一个服务的系统。2、3这需要使用一种能够支持如此精细的服务的使用。我当时扩展了 Windows Communication Foundation (WCF) 来做到这一点,将每个类都视为服务,同时保持类的传统编程模型。4我从来没有把这些服务称为“微服务”。那时和现在一样,我也不认为微服务概念存在。没有微服务,只有服务。例如,我车上的水泵为我的车提供了一项关键服务,而这个水泵只有 8 英寸长。当地自来水公司用来向我所在城镇供水的水泵为该镇提供了一项非常有价值的服务,但它有 8 英尺长。更大的水泵的存在并不会突然把我车上的水泵变成微型水泵:它仍然只是一个水泵。服务就是服务,无论其大小。要理解微服务这一人造概念的起源,您必须反思面向服务的历史。
I am credited as one of the pioneers of microservices. As early as 2006, in my speaking and writing I called for building systems in which every class was a service.2,3 This requires the use of a technology that can support such granular use of services. I extended Windows Communication Foundation (WCF) at the time to do just that, taking every class and treating it as a service while maintaining the conventional programming model of classes.4 I never called these services “microservices.” Then, as now, I did not think the microservices concept existed. There are no microservices—only services. For example, the water pump in my car provides a critical service to my car, and that pump is only 8 inches long. The water pump that the local water company uses to push water to my town provides the town with a very valuable service, but it is 8 feet long. The existence of a larger pump does not suddenly transform the pump in my car into a micropump: It is still just a pump. Services are services regardless of their size. To understand the origin of the artificial concept of microservices, you have to reflect on the history of service-orientation.
2. https://wikipedia.org/wiki/Microservices#History
2. https://wikipedia.org/wiki/Microservices#History
3. Juval Löwy,《编程 WCF 服务》,第 1 版。(O'Reilly Media,2007 年),543–553。
3. Juval Löwy, Programming WCF Services, 1st ed. (O’Reilly Media, 2007), 543–553.
4. Löwy,《WCF 服务编程》,第 1 版,第 48-51 页;Juval Löwy,《WCF 服务编程》,第 3 版(O'Reilly Media,2010 年),第 74-75 页。
4. Löwy, Programming WCF Services, 1st ed., pp. 48–51; Juval Löwy, Programming WCF Services, 3rd ed. (O’Reilly Media, 2010), 74–75.
在面向服务理念刚刚兴起的 21 世纪初期,许多组织只是将整个系统作为一项服务进行公开。由此产生的庞大整体由于其复杂性而无法维护和扩展。经过 10 年的痛苦挣扎,业界认识到了这种方法的错误,并开始呼吁更精细地使用服务,即所谓的微服务。在常见用法中,微服务对应于域或子系统,即图 3-8中的切片(红色框)。如今,这个想法的实践存在三个问题。
At the dawn of service-orientation, in the early 2000s, many organizations simply exposed their system as a whole as a service. The resulting monstrous monolith was impossible to maintain and extend due to its complexity. Some 10 years of agony later, the industry recognized the error of this approach and started calling for more granular use of services, which it dubbed microservices. In common usage, microservices correspond to domains or subsystems—that is, to the slices (red boxes) of Figure 3-8. There are three problems with this idea as practiced today.
第一个问题是服务数量的隐含约束。如果较小的服务比较大的服务好,那么为什么要止步于子系统级别呢?作为最细粒度的服务单元,子系统仍然太大了。为什么不让子系统的构建块成为服务呢?您应该尽可能地将服务的好处推向架构的最底层。在方法子系统中,子系统中的Manager、Engine和ResourceAccess组件也必须是服务。
The first problem is the implied constraint on the number of services. If smaller services are better than larger services, why stop at the subsystem level? The subsystem is still too big as the most granular service unit. Why not have the building blocks of the subsystem be services? You should push the benefits of services as far down the architecture as possible. In a Method subsystem, the Manager, Engine, and ResourceAccess components within a subsystem must be services as well.
第二个问题是整个行业在微服务设计中广泛使用功能分解。仅凭这一因素就足以让每一项新兴的微服务努力失败。那些试图构建微服务的人将不得不应对功能分解和服务导向的复杂性无法获得服务模块化的任何好处。这种双重打击可能超出了大多数项目的承受能力。事实上,我担心微服务将成为软件历史上最大的失败。可维护、可重用、可扩展的服务是可能的——只是不能以这种方式实现。
The second problem is the widespread use of functional decomposition in microservice design by the industry at large. This factor alone will doom every nascent microservices effort. Those attempting to construct microservices will have to contend with the complexity of both functional decomposition and service-orientation without gaining any of the benefits of the modularity of the services. This double punch may be more than what most projects can handle. Indeed, I fear that microservices will be the biggest failure in the history of software. Maintainable, reusable, extensible services are possible—just not in this way.
第三个问题与通信协议有关。尽管通信协议的选择更多地与详细设计有关,而不是与架构有关,但选择的影响值得在此一并评论。绝大多数微服务堆栈(截至本文撰写时)使用 REST/WebAPI 和 HTTP 与服务进行通信。大多数技术供应商和顾问都全面认可这种做法(也许是因为如果每个人都使用最低公分母,他们的工作会更轻松)。然而,这些协议是为面向公众的服务而设计的,作为系统的网关。作为一般原则,在任何设计良好的系统中,您都不应该在内部和外部使用相同的通信机制。
The third problem relates to communication protocols. Although the choice of communication protocols has more to do with detailed design than with architecture, the effect of the choice is worth a passing comment here. The vast majority of microservice stacks (as of this writing) use REST/WebAPI and HTTP to communicate with the services. Most technology vendors and consultants endorse this practice across the board (perhaps because it makes their life easier if everyone uses the lowest common denominator). These protocols, however, were designed for publicly facing services, as the gateway to systems. As a general principle, in any well-designed system you should never use the same communication mechanism both internally and externally.
例如,我的笔记本电脑有一个驱动器,它为它提供一项非常重要的服务:存储。笔记本电脑还会使用网络路由器为所有 DNS 请求提供的服务,以及提供电子邮件服务的 SMTP 服务器。对于外部服务,笔记本电脑使用 TCP/IP;对于驱动器等内部服务,它使用 SATA。笔记本电脑利用多种此类专用内部协议来执行其基本功能。
For example, my laptop has a drive that provides it with a very important service: storage. The laptop also consumes a service offered by the network router for all DNS requests, and an SMTP server that offers email service. For the external services, the laptop uses TCP/IP; for the internal services like the drive, it uses SATA. The laptop utilizes multiple such specialized internal protocols to perform its essential functions.
另一个例子是人体。你的肝脏为你提供一项非常重要的服务:新陈代谢。你的身体还为你的客户和组织提供一项有价值的服务,你使用自然语言(英语)与他们交流。然而,你不会用英语与你的肝脏交流。相反,你使用神经和激素。
Another example is the human body. Your liver provides you with a very important service: metabolism. Your body also provides a valuable service to your customers and organization, and you use a natural language (English) to communicate with them. However, you do not speak English to communicate with your liver. Instead, you use nerves and hormones.
用于外部服务的协议通常带宽低、速度慢、成本高且容易出错。这些属性表明解耦程度高。不可靠的 HTTP 可能非常适合外部服务,但在内部服务之间应避免使用此协议,因为内部服务之间的通信和服务必须是无可挑剔的。
The protocol used for external services is typically low bandwidth, slow, expensive, and error prone. Such attributes indicate a high degree of decoupling. Unreliable HTTP may be perfect for external services, but this protocol should be avoided between internal services where the communication and the services must be impeccable.
在服务之间使用错误的协议可能会致命。如果你无法与老板沟通或与客户产生误解,这并不是世界末日,但如果你无法与肝脏正确沟通或根本无法沟通,你就会死。
Using the wrong protocol between services can be fatal. It is not the end of the world if you cannot talk with your boss or have a misunderstanding with a customer, but you will die if you cannot communicate correctly or at all with your liver.
专业化和效率也存在类似的服务水平问题。在内部服务之间使用 HTTP 就像使用英语来控制身体的内部服务。即使这些词能够被完美地听见和理解,英语也缺乏描述内部服务交互所需的适应性、性能和词汇量。
Similar level-of-service issues exist with specialization and efficiency. Using HTTP between internal services is akin to using English to control your body’s internal services. Even if the words were perfectly heard and understood, English lacks the adaptability, performance, and vocabulary required for describing the internal services’ interactions.
Engines和ResourceAccess等内部服务应依赖于快速、可靠、高性能的通信渠道。这些包括 TCP/IP、命名管道、IPC、域套接字、Service Fabric 远程处理、自定义内存拦截链、消息队列等。
Internal services such as Engines and ResourceAccess should rely on fast, reliable, high-performance communication channels. These include TCP/IP, named pipes, IPC, Domain Sockets, Service Fabric remoting, custom in-memory interception chains, message queues, and so on.
任何分层架构都可以有两种可能的操作模型之一:开放或封闭。本节对比了这两种替代方案。从此讨论中,您可以在服务分类的背景下收集一些额外的设计指南。
Any layered architecture can have one of two possible operational models: open or closed. This section contrasts the two alternatives. From this discussion, you can glean some additional design guidelines in the context of the classification of services.
在开放式架构中,任何组件都可以调用任何其他组件,无论组件位于哪一层。组件可以随意向上、向外和向下调用。开放式架构提供了最大的灵活性。然而,开放式架构通过牺牲封装性和引入大量耦合来实现这种灵活性。
In an open architecture, any component can call any other component regardless of the layer in which the components reside. Components can call up, sideways, and down as much as you like. Open architectures offer the ultimate flexibility. However, an open architecture achieves that flexibility by sacrificing encapsulation and introducing a significant amount of coupling.
例如,想象一下图 3-4中的引擎直接调用资源。虽然这种调用在技术上是可行的,但当你想切换资源或只是改变访问资源的方式时,突然间你所有的引擎都必须改变。客户端直接调用ResourceAccess服务怎么样?虽然这并不像调用资源本身那么糟糕,但所有的业务逻辑都必须迁移到客户端。任何对业务逻辑的更改都将迫使重新设计客户端。
For example, imagine in Figure 3-4 that the Engines directly call the Resources. While such a call is technically possible, when you wish to switch Resources or merely change the way you access a Resource, suddenly all your Engines must change. How about the Clients calling ResourceAccess services directly? While that is not as bad as calling the Resources themselves, all the business logic must migrate to the Clients. Any change to the business logic would then force reworking the Clients.
调用层也是不明智的。在图 3-4中,如果管理器调用客户端来更新 UI 中的某些控件,会怎么样?现在,随着 UI 的变化,管理器也必须对该变化做出响应。您已将客户端的波动性导入到管理器中。
Calling up a layer is also inadvisable. In Figure 3-4, what if a Manager called a Client to update some control in the UI? Now as the UI changes, the Manager must respond to that change, too. You have imported the volatility of the Clients to the Managers.
横向调用(层内调用)也会产生过多的耦合。想象一下图 3-4中的Manager A调用。在这种情况下,只是由执行的用例内部的活动。管理人员应该Manager BManager BManager A封装一组独立的用例。Manager B现在的用例是否独立于的用例?对执行活动的方式Manager A的任何更改都会破坏,这让人想起图 2-5中的问题。以这种方式横向调用几乎总是经理级别功能分解的结果。Manager BManager A
Calling sideways (intra-layer) also creates an inordinate amount of coupling. Imagine Manager A calling Manager B in Figure 3-4. In this case, Manager B is just an activity inside a use case executed by Manager A. Managers are supposed to encapsulate a set of independent use cases. Are the use cases of Manager B now independent of those of Manager A? Any change to Manager B’s way of doing the activity will break Manager A, calling to mind the issues of Figure 2-5. Calling sideways in this way is almost always the result of functional decomposition at the Managers level.
如何Engine A调用Engine B?是Engine B一个独立的易变活动?同样,功能分解可能是链接引擎Engine A调用需求的背后原因。
How about Engine A calling Engine B? Was Engine B a separate volatile activity from Engine A? Again, functional decomposition is likely behind the need to chain the Engines calls.
当使用开放式架构时,架构层本身几乎没有任何好处。一般来说,在软件工程中,用封装换取灵活性是一种糟糕的交易。
When using open architecture, there is hardly any benefit of having architectural layers in the first place. In general, in software engineering, trading encapsulation for flexibility is a bad trade.
在封闭架构中,您努力通过禁止层与层之间向上调用和层内横向调用来最大限度地发挥层的优势。禁止层与层之间向下调用将最大限度地分离层与层之间的耦合,但会产生无用的设计。封闭架构在层之间打开了一个缝隙,允许一层中的组件调用相邻下层的组件。层中的组件为其上一层中的组件提供服务,但它们封装了下层发生的一切。封闭架构通过牺牲灵活性换取封装来促进分离。一般来说,这比反过来要好。
In a closed architecture, you strive to maximize the benefits of the layers by disallowing calling up between layers and sideways within layers. Disallowing calling down between layers would maximize the decoupling between the layers but produce a useless design. A closed architecture opens a chink in the layers, allowing components in one layer to call those in the adjacent lower layer. The components within a layer are of service to the components in the layer immediately above them, but they encapsulate whatever happens underneath. Closed architecture promotes decoupling by trading flexibility for encapsulation. In general, that is a better trade than the other way around.
很容易指出开放式架构的明显问题——允许向上、向下或横向调用。但是,这三种弊端是否同样严重?其中最糟糕的是向上调用:这不仅会产生跨层耦合,还会将较高层的波动性引入较低层。第二严重的弊端是横向调用,因为这种调用会耦合层内的组件。封闭式架构允许调用下一层,但如何向下调用多层呢?半封闭/半开放架构允许向下调用多个层。这又是用封装换取灵活性和性能,一般来说,这是一种应该避免的交易。
It is easy to point out the clear problems with open architectures—of allowing calling up, down, or sideways. However, are all three sins equally bad? The worst of them is calling up: That not only creates cross-layer coupling, but also imports the volatility of a higher layer to the lower layers. The second worst offender is calling sideways because such calls couple components inside the layer. The closed architecture allows calling one layer below, but what about calling multiple layers down? A semi-closed/semi-open architecture allows calling more than one layer down. This, again, is a trade of encapsulation for flexibility and performance and, in general, is a trade to avoid.
值得注意的是,在两种典型情况下,使用半封闭/半开放架构是合理的。第一种情况发生在你设计一些关键基础设施时,你必须从中榨干每一点性能。在这种情况下,过渡到降低多层可能会对性能产生不利影响。例如,考虑用于网络通信的七层开放系统互连 (OSI) 模型。5当供应商在其 TCP 堆栈中实现此模型时,他们无法承受每次调用七层所带来的性能损失,因此他们明智地为堆栈选择了半封闭/半开放架构。第二种情况发生在几乎从不更改的代码库中。在这样的代码库中,封装的损失和额外的耦合无关紧要,因为您不必维护太多代码,甚至根本不需要维护。同样,网络堆栈实现是几乎从不更改的代码的一个很好的例子。
Notably, the use of semi-closed/semi-open architecture is justified in two classic cases. This first case occurs when you design some key piece of infrastructure, and you must squeeze every ounce of performance from it. In such a case, transitioning down multiple layers may adversely affect performance. For example, consider the Open Systems Interconnection (OSI) model of seven layers for network communication.5 When vendors implement this model in their TCP stack, they cannot afford the performance penalty incurred by seven layers for every call, and they sensibly choose a semi-closed/semi-open architecture for the stack. The second case occurs within a codebase that hardly ever changes. The loss in encapsulation and the additional coupling in such a codebase is immaterial because you will not have to maintain the code much, if at all. Again, a network stack implementation is a good example for code that hardly ever changes.
5. https://en.wikipedia.org/wiki/OSI_model
5. https://en.wikipedia.org/wiki/OSI_model
半封闭/半开放架构确实有其适用之处。然而,大多数系统的性能水平不足以证明这种设计是合理的,而且其代码库也并非一成不变。
Semi-closed/semi-open architectures do have their place. Nevertheless, most systems do not have the level of performance required to justify such designs, and their codebase is never that immutable.
对于实际的业务系统,最好的选择永远是封闭式架构。前面几节中对开放和半开放选项的讨论应该会阻止你做出任何其他选择。
For real-life business systems, the best choice is always a closed architecture. The discussion in the previous sections of the open and semi-open options should discourage you from any other choice.
虽然封闭架构系统是最解耦和最封装的,但它们也是最不灵活的。由于间接性和中介性,这种不灵活性可能导致拜占庭式的复杂性,而僵化的设计是不可取的。该方法放宽了封闭架构的规则,以降低复杂性和开销,而不会损害封装或解耦。
While closed architecture systems are the most decoupled and the most encapsulated, they are also the least flexible. This inflexibility could lead to Byzantine-like levels of complexity due to the indirections and intermediacy, and rigid design is inadvisable. The Method relaxes the rules of closed architecture to reduce complexity and overhead without compromising encapsulation or decoupling.
在封闭式架构中,实用程序是一个挑战。考虑Logging用于记录运行时事件的服务。如果将其归类Logging为资源,则资源访问可以使用它,但管理器不能。如果将其放置在与管理器Logging相同的级别,则只有客户端可以记录日志。或也是如此——几乎所有其他组件都需要的服务。简而言之,在封闭式架构的各层中,实用程序没有合适的位置。方法将实用程序放在层侧的垂直栏中(参见图 3-4)。此栏跨越所有层,允许架构中的任何组件使用任何实用程序。SecurityDiagnostics
In a closed architecture, Utilities pose a challenge. Consider Logging, a service used for recording run-time events. If you classify Logging as a Resource, then the ResourceAccess can use it, but the Managers cannot. If you place Logging at the same level as the Managers, only the Clients can log. The same goes for Security or Diagnostics—services that almost all other components require. In short, there is no good location for Utilities among the layers of a closed architecture. The Method places Utilities in a vertical bar on the side of the layers (see Figure 3-4). This bar cuts across all layers, allowing any component in the architecture to use any Utility.
你可能会看到一些开发人员试图滥用实用程序栏,将他们希望跨所有层短路的任何组件命名为实用程序。并非所有组件可以驻留在实用程序栏中。要成为实用程序,组件必须通过一个简单的试金石测试:该组件是否可以在任何其他系统中使用,例如智能卡布奇诺咖啡机?例如,智能卡布奇诺咖啡机可以使用Security服务来查看用户是否可以喝咖啡。同样,卡布奇诺咖啡机可能希望记录办公室工作人员喝了多少咖啡,进行诊断,并能够使用该Pub/Sub服务发布通知咖啡即将用完的事件。这些需求中的每一个都证明了封装在实用程序服务中的合理性。相比之下,您很难解释为什么卡布奇诺咖啡机有一个抵押贷款利息计算服务作为实用程序。
You may see attempts by some developers to abuse the utilities bar by christening as a Utility any component they wish to short-circuit across all layers. Not all components can reside in the utilities bar. To qualify as a Utility, the component must pass a simple litmus test: Can the component plausibly be used in any other system, such as a smart cappuccino machine? For example, a smart cappuccino machine could use a Security service to see if the user can drink coffee. Similarly, the cappuccino machine may want to log how much coffee the office workers drink, have diagnostics, and be able to use the Pub/Sub service to publish an event notifying that it is running low on coffee. Each of these needs justifies encapsulation in a Utility service. In contrast, you will be hard-pressed to explain why a cappuccino machine has a mortgage interest calculating service as a Utility.
下一条指导原则可能隐含其中,但明确说明这一点非常重要。由于它们位于同一层,管理器和引擎都可以调用ResourceAccess服务,而不会违反封闭式架构(参见图 3-4 )。定义管理器和引擎的部分也隐含了允许管理器调用ResourceAccess 。不使用引擎的管理器必须能够访问底层资源。
This next guideline may be implied, but it is important enough to state explicitly. Because they are in the same layer, both Managers and Engines can call ResourceAccess services without violating the closed architecture (see Figure 3-4). Allowing Managers to call ResourceAccess is also implied from the section defining Managers and Engines. A Manager that uses no Engines must be able to access the underlying Resources.
管理器可以直接调用引擎。管理器和引擎之间的分离几乎是在详细设计层面上进行的。引擎实际上只是策略设计模式6的一种表达,用于实现管理器工作流中的活动。因此,管理器到引擎的调用并不是真正的横向调用,管理器到管理器的调用就是这种情况。或者,您可以将引擎视为位于与管理器不同的或正交的平面上。
Managers can directly call Engines. The separation between Managers and Engines is almost at the detailed design level. Engines are really just an expression of the Strategy design pattern6 used to implement the activities within the Managers’ workflows. Therefore, Manager-to-Engine calls are not truly sideways calls, as is the case with Manager-to-Manager calls. Alternatively, you can think of Engines as residing in a different or orthogonal plane to the Managers.
6. Erich Gamma、Richard Helm、Ralph Johnson 和 John Vlissides,《设计模式:可重用面向对象软件的元素》(Addison-Wesley,1994 年)。
6. Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley, 1994).
虽然Manager不应直接横向调用其他Manager,但Manager可以将调用排队到其他Manager。实际上,有两种解释(技术解释和语义解释)可以解释为什么这并不违反封闭架构原则。
While Managers should not call directly sideways to other Managers, a Manager can queue a call to another Manager. There are actually two explanations—a technical one and a semantic one—why this does not violate the closed architecture principle.
技术解释涉及排队调用的机制。当客户端调用排队服务时,客户端会与服务的代理进行交互,然后代理将消息存入服务的消息队列中。队列侦听器实体监视队列,检测新消息,将其从队列中取出,然后调用服务。使用方法结构,当Manager将调用排队到另一Manager时,代理是底层资源(队列)的ResourceAccess;也就是说,调用实际上是向下进行的,而不是横向进行的。队列侦听器实际上是系统中的另一个客户端,它也向下调用接收Manager。实际上没有发生横向调用。
The technical explanation involves the very mechanics of a queued call. When a client calls a queued service, the client interacts with a proxy to the service, which then deposits the message into a message queue for the service. A queue listener entity monitors the queue, detects the new message, picks it off the queue, and calls the service. Using The Method structure, when a Manager queues a call to another Manager, the proxy is a ResourceAccess to the underlying Resource, the queue; that is, the call actually goes down, not sideways. The queue listener is effectively another Client in the system, and it is also calling downward to the receiving Manager. No sideways call actually takes place.
语义解释涉及用例的性质。业务系统通常有一个用例会触发另一个用例的潜在、被推迟很久的执行。例如,想象一个系统,其中执行用例的经理必须保存一些系统状态以便在月底进行分析。在不中断流程的情况下,经理可以将分析请求排队到另一个经理。第二个经理可以在月底出队并执行其分析工作流程。这两个用例在时间线上是独立的和解耦的。
The semantic explanation involves the nature of use cases. Business systems quite commonly have one use case that triggers a latent, much-deferred execution of another use case. For example, imagine a system in which a Manager executing a use case must save some system state for analysis at the end of the month. Without interrupting its flow, the Manager could queue the analysis request to another Manager. The second Manager could dequeue at the month’s end and perform its analysis workflow. The two use cases are independent and decoupled on the timeline.
即使有最好的指导方针,您还是会一次又一次地发现开发人员试图通过横向调用、向上调用或违反其他封闭架构来开放架构。不要忽视这些违规行为或要求盲目遵守指导方针。发现这种违规行为几乎总是表明某些潜在需求使开发人员违反了指导方针。您必须以符合封闭架构原则的方式正确满足该需求。例如,假设在设计或代码审查期间,您发现一个Manager直接调用另一个Manager 。开发人员可能会尝试通过指出另一个用例响应原始用例而执行的某些要求来证明横向调用的合理性。但是,第二个Manager的响应必须立即发生的可能性很小。对Manager之间的调用进行排队既是一种更好的设计,又可以避免横向调用。
Even with the best set of guidelines, time and again you will find developers trying to open the architecture by calling sideways, calling up, or committing some other violation of a closed architecture. Do not brush these transgressions aside or demand blind compliance with the guidelines. Nearly always, the discovery of such a transgression indicates some underlying need that made developers violate the guidelines. You must address that need correctly, in a way that complies with the closed architecture principle. For example, suppose that during design or code review, you discover one Manager directly calling another Manager. The developer may attempt to justify the sideways call by pointing out some requirement for another use case to execute in response to the original use case. It is very unlikely, however, that the response of the second Manager must occur immediately. Queuing the inter-Manager call would both be a better design and avoid the sideways call.
在另一次审查中,假设您检测到Manager调用Client ——这严重违反了封闭架构原则。作为理由,开发人员指出了在发生某些事情时通知Client的要求。虽然调用是一个有效的要求,但它并不是一个可接受的解决方案。随着时间的推移,其他Client可能需要通知,或者其他Manager可能需要通知Client。您发现通知者和接收该事件的人存在波动。您应该使用Pub/Sub实用程序栏中的服务来封装这种波动。当然,Manager可以调用该实用程序。将来,添加其他订阅Client或发布Manager是一项简单的任务,不会对系统产生不良影响。
In another review, suppose you detect a Manager calling up to a Client—a gross violation of the closed architecture principle. As justification, the developer points to a requirement to notify the Client when something happens. While a valid requirement, calling up is not an acceptable resolution. Over time, other Clients could need the notification, or other Managers might need to notify the Client. What you have unearthed is volatility in who notifies and volatility in who receives that event. You should encapsulate that volatility by using a Pub/Sub service from the utilities bar. The Manager can, of course, call that Utility. In the future, adding other subscribing Clients or publishing Managers is a trivial task and would have no undesired repercussions in the system.
有了服务和层的定义,还可以编制一份要避免的事情列表——设计“禁忌”。在阅读了前面几节之后,列表中的某些项目可能对您来说显而易见,但我经常看到它们,因此得出结论,它们毕竟并不明显。人们违背“禁忌”准则的主要原因是他们创建了功能分解,并设法说服自己它不是功能性的。
With the definitions in place for both the services and the layers, it is also possible to compile a list of things to avoid—the design “don’ts.” Some of the items on the list may appear obvious to you after the previous sections, yet I have seen them often enough to conclude they were not obvious after all. The main reason people go against a “don’t” guideline is that they have created a functional decomposition and have managed to convince themselves that it is not functional.
如果你做了以下列表中的某件事,你很可能后悔莫及。将任何违反这些规则的行为视为危险信号,并进一步调查以了解你遗漏了什么:
If you do one of the things on this list, you will likely live to regret it. Treat any violation of these rules as a red flag and investigate further to see what you are missing:
客户端不会在同一用例中调用多个管理器。这样做意味着管理器紧密耦合,不再代表单独的用例系列、单独的子系统或单独的片段。来自客户端的链式管理器调用表明功能分解,需要客户端将底层功能拼接在一起(参见图 2-1)。客户端可以调用多个管理器,但不能在同一用例中调用;例如,客户端可以调用以执行用例 1,然后调用以执行用例 2。Manager AManager B
Clients do not call multiple Managers in the same use case. Doing so implies that the Managers are tightly coupled and no longer represent separate families of use cases, or separate subsystems, or separate slices. Chained Manager calls from the Client indicate functional decomposition, requiring the Client to stitch the underlying functionalities together (see Figure 2-1). Clients can call multiple Managers but not in the same use case; for example, a Client can call Manager A to perform use case 1 and then call Manager B to perform use case 2.
客户端不调用引擎。业务层的唯一入口点是管理器。管理器代表系统,而引擎实际上是内部层实现细节。如果客户端调用引擎,用例排序和相关波动性将被迫迁移到客户端,从而用业务逻辑污染它们。从客户端到引擎的调用是功能分解的标志。
Clients do not call Engines. The only entry points to the business layer are the Managers. The Managers represent the system, and the Engines are really an internal layer implementation detail. If the Clients call the Engines, use case sequencing and associated volatility are forced to migrate to the Clients, polluting them with business logic. Calls from Clients to Engines are the hallmark of a functional decomposition.
在同一用例中,管理器不会将呼叫排队到多个管理器。如果有两个管理器接收排队呼叫,为什么不是第三个?为什么不是所有管理器?需要两个(或更多)管理器响应排队呼叫强烈表明需要更多管理器(也许是所有管理器)响应,因此您应该使用Pub/Sub 实用程序服务。
Managers do not queue calls to more than one Manager in the same use case. If there are two Managers receiving a queued call, why not a third? Why not all of them? The need to have two (or more) Managers respond to a queued call is a strong indication that more Managers (and maybe all of them) would need to respond, so you should use a Pub/Sub Utility service instead.
引擎不接收排队调用。引擎是实用的,存在是为了为Manager执行不稳定的活动。它们本身没有独立的含义。根据定义,排队调用独立于系统中的任何其他内容执行。仅执行Engine的活动,与任何用例或其他活动脱节,没有任何商业意义。
Engines do not receive queued calls. Engines are utilitarian and exist to execute a volatile activity for a Manager. They have no independent meaning on their own. A queued call, by definition, executes independently from anything else in the system. Performing just the activity of an Engine, disconnected from any use case or other activities, does not make any business sense.
ResourceAccess服务不接收排队调用。与Engines指南非常相似,ResourceAccess服务用于为Manager或Engine提供服务,本身没有任何意义。独立于系统中的任何其他内容访问资源没有任何商业意义。
ResourceAccess services do not receive queued calls. Very similar to the Engines guideline, ResourceAccess services exist to service a Manager or an Engine and have no meaning on their own. Accessing a Resource independently from anything else in the system does not make any business sense.
客户端不发布事件。事件代表客户端(或管理器)可能想要了解的系统状态变化。客户端无需通知自己(或其他客户端)。此外,通常需要了解系统内部情况才能检测发布事件的需要——客户端不应该知道这些知识。但是,通过功能分解,客户端就是系统,需要发布事件。
Clients do not publish events. Events represent changes to the state of the system about which Clients (or Managers) may want to know. A Client has no need to notify itself (or other Clients). In addition, knowledge of the internals of the system is often required to detect the need to publish an event—knowledge that the Clients should not have. However, with a functional decomposition, the Client is the system and needs to publish the event.
引擎不发布事件。发布事件需要注意并响应系统中的变化,通常是由管理器执行的用例中的一个步骤。执行活动的引擎无法了解有关活动上下文或用例状态的太多信息。
Engines do not publish events. Publishing an event requires noticing and responding to a change in the system and is typically a step in a use case executed by the Manager. An Engine performing an activity has no way of knowing much about the context of the activity or the state of the use case.
ResourceAccess服务不发布事件。ResourceAccess服务无法知道资源状态对系统的重要性。任何此类知识或响应行为都应驻留在Managers中。
ResourceAccess services do not publish events. ResourceAccess services have no way of knowing the significance of the state of the Resource to the system. Any such knowledge or responding behavior should reside in Managers.
资源不发布事件。资源发布事件的需求通常是紧密耦合的功能分解的结果。与ResourceAccess的情况类似,这种业务逻辑应驻留在Managers中。当Manager修改系统状态时,Manager也应发布相应的事件。
Resources do not publish events. The need for the Resource to publish events is often the result of a tightly coupled functional decomposition. Similar to the case for ResourceAccess, business logic of this kind should reside in Managers. As a Manager modifies the state of the system, the Manager should also publish the appropriate events.
Engines、ResourceAccess和Resources不订阅事件。处理事件几乎总是某些用例的开始,因此必须在Client或Manager中完成。Client可能会将事件通知给用户,而Manager可能会执行某些后端行为。
Engines, ResourceAccess, and Resources do not subscribe to events. Processing an event is almost always the start of some use case, so it must be done in a Client or a Manager. The Client may inform a user about the event, and the Manager may execute some back-end behavior.
引擎从不相互调用。此类调用不仅违反了封闭架构原则,而且在基于波动性的分解中也没有意义。引擎应该已经封装了与该活动相关的所有内容。任何引擎到引擎的调用都表明功能分解。
Engines never call each other. Not only do such calls violate the closed architecture principle, but they also do not make sense in a volatility-based decomposition. The Engine should have already encapsulated everything to do with that activity. Any Engine-to-Engine calls indicate functional decomposition.
ResourceAccess服务从不相互调用。如果ResourceAccess服务封装了原子业务动词的易变性,则一个原子动词就不能需要另一个原子动词。这类似于引擎不应相互调用的规则。请注意,ResourceAccess和资源之间不需要1:1 映射(每个资源都有自己的ResourceAccess )。通常,两个或多个资源在逻辑上必须连接在一起才能实现某些原子业务动词。单个ResourceAccess服务应该执行连接,而不是ResourceAccess服务之间的调用。
ResourceAccess services never call each other. If ResourceAccess services encapsulate the volatility of an atomic business verb, one atomic verb cannot require another. This is similar to the rule that Engines should not call each other. Note that a 1:1 mapping between ResourceAccess and Resources (every Resource has its own ResourceAccess) is not required. Often two or more Resources logically must be joined together to implement some atomic business verbs. A single ResourceAccess service should perform the join rather than inter-ResourceAccess services calls.
另一个通用的设计规则是所有好的架构都是对称的。想想你自己的身体。你没有第三只手伸出来在你的右侧,因为进化的压力是全方位的,强制对称。进化软件系统也面临压力,迫使系统对不断变化的环境做出反应,否则就会灭绝。然而,对对称性的追求只是在架构层面,而不是在详细设计层面。当然,你的内脏器官不是对称的,因为这种对称性并没有给你的祖先带来进化优势(即,当你暴露其内部结构时,系统就会消亡)。
Another universal design rule is that all good architectures are symmetric. Consider your own body. You do not have a third hand sticking up on your right side because evolutionary pressures were omnidirectional, enforcing symmetry. Evolutionary pressures apply to software systems as well, forcing the systems to respond to the changing environment or become extinct. The quest for symmetry, however, is only at the architecture level, not in detailed design. Certainly, your internal organs are not symmetric because such symmetry offered no evolutionary advantage to your ancestors (i.e., the system dies when you expose its internals).
软件系统中的对称性表现为跨用例的重复调用模式。您应该预料到对称性,而它的缺失则值得关注。例如,假设一个Manager实现了四个用例,其中三个使用Pub/Sub服务发布事件,而第四个没有发布。这种对称性的破坏是一种设计异味。为什么第四个案例不同?你遗漏了什么或者做得太过分了?那个Manager是真正的Manager,还是一个没有波动性的功能分解组件?对称性还会因某种东西的存在而被破坏,而不仅仅是因为它的缺失。例如,如果一个Manager实现了四个用例,其中只有一个用例最终会排队调用另一个Manager,那么这种不对称也是一种异味。对称性对于好的设计如此重要,以至于您通常应该在各个Manager中看到相同的调用模式。
The symmetry in software systems manifests in repeated call patterns across use cases. You should expect symmetry, and its absence is a cause for concern. For example, suppose a Manager implements four use cases, three of which publish an event with the Pub/Sub service and the fourth of which does not. That break of symmetry is a design smell. Why is the fourth case different? What are you missing or overdoing? Is that Manager a real Manager, or is it a functionally decomposed component without volatility? Symmetry can also be broken by the presence of something, not just by its absence. For example, if a Manager implements four use cases, of which only one ends up with a queued call to another Manager, that asymmetry is also a smell. Symmetry is so fundamental for good design that you should generally see the same call patterns across Managers.
您的软件系统存在的理由是通过满足客户的要求和需求来为企业提供服务。前两章讨论了如何将系统分解为组件以创建其架构。分解为组件本质上是系统的静态布局,就像蓝图一样。在执行期间,系统是动态的,各个组件相互交互。但是,您如何知道这些组件在运行时的组合是否充分满足所有要求?验证您的设计与需求分析、系统设计和您作为架构师的附加值有关。正如您将看到的,设计验证和组合是密切相关的。您可以并且必须能够生成可行的设计并以可重复的方式对其进行验证。
Your software system’s reason for being is to service the business by addressing its customers’ requirements and needs. The previous two chapters discussed how to decompose the system into its components to create its architecture. The decomposition into components is inherently a static layout of the system, like a blueprint. During execution, the system is dynamic, and the various components interact with each other. But how do you know the composition of these components at run-time adequately satisfies all the requirements? Validating your design has to do with requirements analysis, system design, and your added value as the architect. Design validation and composition, as you will see, are intimately related. You can and must be able to produce a viable design and validate it in a repeatable manner.
本章为您提供了验证系统不仅能满足当前需求,还能承受未来需求变化的工具。要实现这一目标,首先需要认识到需求和变化的性质,以及两者与系统设计的关系。这种认识反过来又会产生关于系统设计的基本观察以及产生有效设计的实用建议。
This chapter provides you with the tools to verify that the system not only addresses the current requirements but also can withstand future changes to the requirements. That objective requires first recognizing the nature of requirements and change, and how both relate to system design. This recognition, in turn, yields a fundamental observation about system design along with practical recommendations for producing a valid design.
需求会变。接受它——这就是需求的作用。
Requirements change. Accept it—that’s what requirements do.
需求变化是奇妙的。如果需求是静态的,那么我们谁都找不到工作:某个地方的某个人过去会开发出某个合适的系统版本,并且该系统从那时起就会一直使用。需求变化越多,技术人员的境况就会越好。这个世界严重依赖软件,但开发人员和架构师仍然很少,而其他人却很多。需求变化越多,对软件专业服务的需求就会越高,而且由于软件专业人员的供应有限,他们的薪酬和福利就会越高。
Requirement change is fantastic. If requirements were static, none of us would have a job: Someone, somewhere, would have developed some adequate version of the system in the past, and that system would have been in service ever since. The more requirements change, the better off everyone among the technical ranks will be. The world is heavily dependent on software, yet there are still so few developers and architects and so many of everyone else. The more the requirements change, the higher the demand for software professional services will be, and since the supply of software professionals is limited, the higher their compensation and benefits will be.
尽管需求变更是一件美妙的事情,但业内许多人的整个职业生涯都在憎恨这种变更。原因很简单:大多数开发人员和架构师都是根据需求来设计系统的。事实上,他们不遗余力地将需求转录到架构的组件中。他们努力使需求和系统设计之间的亲和力最大化。然而,当需求发生变化时,他们的设计也必须改变。在任何系统中,设计的改变都是非常痛苦的,往往是破坏性的,而且总是代价高昂。因为没有人喜欢痛苦(即使是自己造成的,就像在这种情况下),人们已经学会了憎恨需求的变更,实际上是憎恨给他们提供便利的人。
As wonderful as changes to requirements are, many in the industry have spent their entire career resenting such changes. The reason is simple: Most developers and architects design their system against the requirements. In fact, they go to great lengths to transcribe the requirements to components of the architecture. They strive to maximize the affinity between the requirements and the system design. However, when the requirements change, their design also must change. In any system, a change to the design is very painful, often destructive, and always expensive. Since nobody likes pain (even when it is self-inflicted, as in this case), people have learned to resent changes to requirements, literally resenting the hand that feeds them.
解决这种对变革不满的不和谐之感的方法非常简单,几乎每个人在整个职业生涯中都未能找到它:
The solution for this dissonance of resenting the changes is so simple that it has eluded almost everyone for their entire career:
永远不要违背要求进行设计。
Never design against the requirements.
这一简单的指令与大多数人所学和实践的内容相悖,尽管它本应是显而易见的。任何违背需求进行设计的尝试都必然会带来痛苦。既然痛苦是糟糕的,那么就不应该为做如此不明智的事情找借口。人们甚至可能完全意识到他们的设计过程无法奏效,而且从未奏效过,但由于缺乏替代方案,他们只能求助于他们所知道的唯一选择——违背需求进行设计。
This simple directive goes contrary to what most have been taught and have practiced, even though it should have been plainly evident for all to see. Any attempt at designing against the requirements will always guarantee pain. Since pain is bad, there should be no excuse for doing something that is so ill advised. People may even be fully aware that their design process cannot work and has never worked, but lacking alternatives they resort to the one option they know—to design against the requirements.
如第 3 章所述,捕获需求的正确方法是使用用例的形式:系统所需的行为集。一个体面的系统有几十个这样的用例,大型系统可能有数百个。同时,在软件历史上,没有人有时间在项目开始时正确地规范出数百个用例。
As discussed in Chapter 3, the correct way of capturing the requirements is in the form of use cases: the required set of behaviors of the system. A decent-sized system has dozens of these use cases, and large systems may have hundreds. At the same time, no one in the history of software has ever had the time to correctly spec-out hundreds of use cases at the beginning of the project.
假设在新项目的第一天,你收到一个包含 300 个用例的文件夹。你能相信这个集合是正确和完整的吗?你会感到惊讶吗?得知实际数量实际上是 330 个用例,而您遗漏了一些用例?如果您收到 300 个用例,您会惊讶地发现实际数量实际上是 200 个,因为需求规范包含许多重复项吗?在这种情况下,如果您根据需求进行设计,您的工作量至少会不会增加 50%?您是否不可能收到一组用例,其中一些用例是互斥的?有缺陷的用例迫使您实施错误行为的风险如何?
Suppose that on day 1 of a new project you are given a folder with 300 use cases. Can you trust that this collection is correct and complete? Would you be surprised to learn that the real number was actually 330 use cases and that you are missing a few use cases? If you are given 300 use cases, will you be shocked to learn that the real number was actually 200 because the requirements spec contains many duplicates? In this case, if you design against the requirements, will you not be doing at least 50% more work? Is it impossible for you to receive a set of use cases in which some of the use cases are mutually exclusive? How about the risk of having defective use cases that compel you to implement the wrong behavior?
即使奇迹般地有人花费大量时间正确地捕获活动图中的全部 300 个用例,确认没有遗漏用例,协调互斥用例,并合并重复用例,这些努力也毫无价值,因为需求会发生变化。随着时间的推移,您将有新的需求,一些现有需求将被删除,而其他需求则会随之改变。简而言之,收集完整需求集并针对它们进行设计的任何尝试都是徒劳的。
Even if by some miracle someone did take the considerable time needed to correctly capture all 300 use cases in activity diagrams, to confirm there are no missing use cases, to reconcile the mutually exclusive use cases, and to consolidate the duplicate use cases, that effort will be of little value because requirements will change. Over time you will have new requirements, some existing requirements will be removed, and other requirements will just change in place. In short, any attempt of gathering the complete set of requirements and designing against them is an exercise in futility.
在规定满足需求的正确方法之前,您必须正确设置满足需求的标准。任何系统设计的目标都是能够满足所有用例。上一句中的“全部”一词实际上意味着全部:现在和将来、已知和未知的用例。这个期望就是标准。低于这个标准就不行。如果您未能通过该标准,那么在未来的某个时候,当需求发生变化时,您的设计将不得不改变。糟糕设计的标志是,当需求发生变化时,设计也必须改变。
Before prescribing the correct way of satisfying the requirements, you have to set the bar correctly for satisfying the requirements. The goal of any system design is to be able to satisfy all use cases. The word all in the previous sentence really means all: present and future, known and unknown use cases. This expectation is where the bar is set. Nothing less will do. If you fail to pass that bar, then at some point in the future, when the requirements change, your design will have to change. The hallmark of a bad design is that when the requirements change, the design has to change as well.
在任何给定系统中,并不是所有用例都是独特且唯一的。大多数用例都是其他用户用例的变体。主要的必需行为有无数种排列组合 — 例如,正常情况、不完整情况、特定区域的特定客户情况、错误情况等等。用例只有两种类型:核心用例和所有其他用例。核心用例代表系统业务的本质。如第 2 章所述,业务性质几乎不会改变,核心用例也是如此。当然,常规的非核心用例会在业务的客户之间发生很大变化。虽然您的客户可以并且可能会对常规用例有自己的定制和解释,但所有客户都共享核心用例。
In any given system, not all use cases are distinct and unique. Most of the use cases are variations of other user cases. The main required behavior has numerous permutations—for example, the normal case, the incomplete case, the case for a specific customer in a particular locale, the error case, and so on. There are only two types of use cases: core use cases and all other use cases. The core use cases represent the essence of the business of the system. As discussed in Chapter 2, the nature of the business hardly ever changes, and the same goes for the core use cases. The regular, non-core use cases will, of course, change at great rate across and between customers of the business. While your customers can and likely will have their own customization and interpretation of the regular use cases, all customers share the core use cases.
虽然系统可能有数百个用例,但值得庆幸的是,系统只有少数几个核心用例。在 IDesign 的实践中,我们经常看到系统的核心用例少得惊人。大多数系统只有两三个核心用例,而且数量很少超过六个。回想一下办公室里的系统或您最近参与的项目,在脑海中计算一下系统需要处理的真正不同用例的数量。您会发现这个数字很少,非常少。或者,拿出系统的单页营销手册,数一数项目符号的数量。您可能不会超过三个项目符号。
While the system may have hundreds of use cases, the saving grace is that the system will have only a handful of core use cases. In our practice at IDesign, we commonly see systems with surprisingly few core use cases. Most systems have as few as two or three core use cases, and the number seldom exceeds six. Reflect on your system at the office or on a recent project with which you were involved and count in your head the number of truly distinct use cases the system is required to handle. You will find that this number is small, very small. Alternatively, bring up a single-page marketing brochure for the system and count the number of bullets. You will likely have no more than three bullets.
核心用例几乎从来不会在需求文档中明确列出,无论该文档多么精炼。核心用例数量少并不意味着容易找到,数量少也不容易与其他人达成一致,即什么是核心用例,什么是常规用例。核心用例几乎总是其他用例的某种抽象,甚至可能需要一个新的术语或名称来将其与其他用例区分开来。即使您收到的需求规范中有许多缺失的用例,这种有缺陷的文档也会包含核心用例,因为它们是业务的本质。此外,虽然您不应该针对需求进行设计,但这并不意味着您应该忽略需求。需求分析的全部目的是识别核心用例(以及易变的领域)。作为架构师(以及需求所有者),您需要识别核心用例,通常通过某种迭代过程。
The core use cases will hardly ever be presented explicitly in the requirements document, as refined as that document may be. Their small number does not mean the core use cases are easy to find, nor does the small number make it simple to agree with others on what is a core use case versus a regular use case. A core use case will almost always be some kind of an abstraction of other use cases, and it may even require a new term or name to differentiate it from the rest. Even when you are given a requirements spec that turns out to have many missing use cases, such a flawed document will contain the core use cases because they are the essence of the business. In addition, while you should not design against the requirements, that does not mean you should ignore the requirements. The whole point of requirements analysis is to recognize the core use cases (and the areas of volatility). It is up to you as the architect (along with the requirements owner) to identify the core use cases, often via some iterative process.
作为架构师,您的任务是确定可以组合在一起以满足所有核心用例的最小组件集。由于所有其他用例都只是核心用例的变体,因此常规用例仅表示组件之间的不同交互,而不是不同的分解。现在,当需求发生变化时,您的设计不会发生变化。
Your mission as an architect is to identify the smallest set of components that you can put together to satisfy all the core use cases. Since all the other use cases are merely variations of the core use cases, the regular use cases simply represent a different interaction between the components, not a different decomposition. Now when the requirements change, your design does not.
我把这种方法称为可组合设计。可组合设计并不旨在满足任何特定的用例。
I call this approach composable design. A composable design does not aim to satisfy any use case in particular.
您不会特别针对任何用例,这不仅是因为您收到的用例不完整、有缺陷且漏洞百出、自相矛盾,还因为它们会发生变化。即使现有用例不会改变,随着时间的推移,您也会添加新用例并删除其他用例。
You do not target any use case in particular not just because the use cases you were given were incomplete, faulty, and full of holes and contradictions, but because they will change. Even if the existing use case will not change, over time you will have new use cases added and others removed.
一个简单的例子就是人体设计。20多万年前,智人出现在非洲平原,当时的职位要求中还没有软件架构师。你怎么可能用狩猎采集者的身体来满足当今软件架构师的要求呢?答案是,虽然你使用与史前人类相同的组件,但你以不同的方式整合它们。单一的核心用例并没有随着时间的推移而改变:生存。
A simple example is the design of the human body. Homo sapiens appeared on the plains of Africa more than 200,000 years ago, when the requirements at the time did not include being a software architect. How can you possibly fulfill the requirements for a software architect today while using the body of a hunter-gatherer? The answer is that while you are using the same components as a prehistoric man, you are integrating them in different ways. The single core use case has not changed with time: survive.
因为任何系统的目标都是满足需求,可组合设计可以实现另一件事:设计验证。一旦您可以针对每个核心用例在服务之间生成交互,您就生成了一个有效的设计。无需了解未知事物或预测未来。您的设计现在可以处理任何用例,因为所有用例都只表现为相同构建块之间的不同交互。不要再向往某个神秘的项目了,总有一天有人会给您提供所有完整且正确记录的需求。没有必要在前期浪费大量时间试图详细确定需求。即使需求严重受损,您也可以轻松设计有效的系统。
Since the goal of any system is to satisfy the requirements, composable design enables something else: design validation. Once you can produce an interaction between your services for each core use case, you have produced a valid design. There is no need to know the unknown or to predict the future. Your design can now handle any use case, because all use cases manifest themselves only as different interactions between the same building blocks. Stop yearning for some mythical project where one day someone will give you all the requirements complete and properly documented. There is no point in wasting inordinate amount of time up front trying to nail down the requirements in minute detail. You can easily design valid systems even with grossly impaired requirements.
验证设计的过程可以简单到只需生成简单的图表来演示支持用例的架构组件之间的交互。图 4-1按照方法的说法是调用链图。
The act of validating the design can be as straightforward as producing simple diagrams demonstrating the interactions between the components of the architecture that support the use cases. Figure 4-1 is, in The Method’s parlance, a call chain diagram.
图 4-1展示对核心用例支持的简单调用链
Figure 4-1 Simple call chain demonstrating support of a core use case
调用链展示了满足特定用例所需的组件之间的交互。您可以将调用链直接叠加到分层架构图上。图中的组件通过箭头连接,指示组件之间调用的方向和类型 - 实心黑色箭头表示同步(请求/响应)调用,虚线灰色箭头表示排队调用。调用链图是依赖关系图的特化,因此在项目设计期间非常有用(如本书后半部分所述)。
A call chain demonstrates the interaction between components required to satisfy a particular use case. You can literally superimpose the call chain onto the layered architecture diagram. The components in the diagram are connected by arrows indicating the direction and type of the call between components—a solid black arrow for synchronous (request/response) calls, and a dashed gray arrow for a queued call. Call chain diagrams are specializations of a dependency graph and as such are quite useful during project design (as discussed in the second half of this book).
调用链图是一种简单快捷的检查用例和演示设计如何支持用例的方法。调用链图的缺点是它们没有调用顺序的概念,无法捕获调用的持续时间,并且当多方对同一类型的组件进行多次调用时,它们会变得混乱。在许多情况下,组件之间的交互可能很简单,因此您不需要显示顺序、持续时间或多次调用。对于这些情况,您可以决定调用链图足以满足验证目的。此外,非技术受众通常更容易理解调用链。
Call chain diagrams are a simple and quick way of examining a use case and demonstrating how the design supports it. The downside of call chain diagrams is that they have no notion of call order, they have no way of capturing the duration of the calls, and they get confusing when multiple parties make multiple calls to the same type of components. In many cases, the interaction between the components may be simple, so you do not need to show order, duration, or multiple calls. For these cases, you may decide that a call chain diagram is good enough for the validation purpose. Also, call chains are often easier for nontechnical audiences to understand.
方法论中的序列图类似于 UML 序列图。1但是,它包含符号差异,以确保图表类型之间具有共同的含义。生命线根据架构层着色,箭头样式与调用链图中的箭头样式相同。图 4-2是与图 4-1等效的序列图。
A sequence diagram in The Method’s parlance is similar to a UML sequence diagram.1 However, it includes notational differences to assure common meanings between diagram types. Lifelines are colored according to the architectural layers, and the arrow styles are the same as in call chain diagrams. Figure 4-2 is a sequence diagram equivalent to Figure 4-1.
图 4-2使用序列图演示对核心用例的支持
Figure 4-2 Demonstrating support for a core use case with a sequence diagram
1. https://en.wikipedia.org/wiki/Sequence_diagram
1. https://en.wikipedia.org/wiki/Sequence_diagram
在序列图中,用例中的每个参与组件都有一个垂直条,表示其生命线。垂直条对应于组件执行的某些工作或活动。时间从图的顶部流向底部,条的长度表示组件使用的相对持续时间。单个组件可能多次参与同一个用例,并且您甚至可以为同一组件的不同实例设置不同的生命线。水平箭头(实线黑色表示同步,虚线灰色表示排队)表示组件之间的调用。
In a sequence diagram, each participating component in the use case has a vertical bar representing its lifeline. The vertical bars correspond to some work or activity that the component performs. Time flows from top to bottom of the diagram, and the length of the bars indicates the relative duration of the component’s use. A single component may participate multiple times in the same use case, and you can even have different lifelines for different instances of the same component. The horizontal arrows (solid black for synchronous and dashed gray for queued) indicate calls between components.
由于序列图提供了额外的细节层次,因此制作序列图需要更长的时间,但它们通常是演示复杂用例的合适工具,尤其是对于技术受众而言。此外,序列图在后续详细设计中非常有用,有助于定义接口、方法甚至参数。如果您要为详细设计制作它们,那么不妨先为设计验证制作它们,尽管细节较少(例如,暂时省略操作和消息)。
Sequence diagrams take longer to produce due to the additional level of details they offer, but they are often the right tool to demonstrate a complex use case, especially for technical audience. In addition, sequence diagrams are extremely useful in subsequent detailed design for helping to define interfaces, methods, and even parameters. If you are going to produce them for the detailed design, you might as well produce them first for design validation, albeit with fewer details (e.g., omit operations and messages for now).
请记住:作为架构师,您的任务不仅是确定一组可以组合在一起以满足所有核心用例的组件,而且还要确定最小的组件集。为什么是最小?最小到底是什么意思?
Remember: Your mission as the architect is to identify not just a set of components that you can put together to satisfy all the core use case, but the smallest set of components. Why smallest? And what does smallest even mean?
一般来说,你应该创建一个最小化而不是最大化详细设计和实现工作量的架构。在架构方面,少即是多。也就是说,任何架构中的组件数量都存在一个自然的限制。例如,假设你得到了一个包含 300 个用例的需求规范。一方面,满足这些要求的单组件架构构成了最终最小数量的组件,但这样的整体式架构由于其内部复杂性而是一种糟糕的设计(有关服务规模对成本影响的深入讨论,请参阅附录 B)。另一方面,如果你创建一个由 300 个组件组成的架构,每个对应于单个用例,由于集成成本高,这也不是一个好的设计。1 到 300 个组件是足够好的数量。
In general, you should produce an architecture that minimizes, rather than maximizes, the amount of work involved in detailed design and implementation. Less is more when it comes to architecture. That said, a natural constraint exists on the number of components in any architecture. For example, suppose you are given a requirements spec with 300 use cases. On the one hand, a single-component architecture satisfying these requirements constitutes the ultimate smallest number of components, but such a monolith is a horrible design due to its internal complexity (see Appendix B for an in-depth discussion of the effect of service size on cost). On the other hand, if you create an architecture consisting of 300 components, each corresponding to a single use case, that is not a good design either, due to the high integration cost. Somewhere between 1 and 300 components is the good enough number.
当估算存在不确定性时,使用数量级会非常有帮助。例如,在一个有 300 个用例的系统中,有效设计所需的组件数量级是多少?是 1、10、100 还是 1000 个组件?无论系统的具体情况如何,您直观地知道 1、100 和 1000 都是错误答案,因此 10 是一个数量级。
When there is uncertainty in estimation, using orders of magnitude can be very helpful. For example, in a system with 300 use cases, what is the order of magnitude of the number of components required for a valid design? Is it 1, 10, 100, or 1000 components? Regardless of the specifics of the system, you know intuitively that 1, 100, and 1000 are wrong answers, leaving you with 10 as an order of magnitude.
典型软件系统中所需的最小服务集包含 10 个数量级的服务(例如,12 个和 20 个服务集都是 10 个数量级)。这个特定数量级是另一个通用设计概念。按数量级计算,您的身体有多少个内部组件?您的汽车?您的笔记本电脑?由于组合学,对于每个组件,答案大约是 10 个。如果系统通过组合 10 个左右的组件来支持所需的行为,那么即使不允许重复参与的组件或部分集合,也可以实现大量此类组合。因此,即使是少量的有效内部组件也可以支持天文数字的可能用例。
The smallest set of services required in a typical software system contains 10 services in order of magnitude (e.g., sets of both 12 and 20 are on the order of 10). This particular order of magnitude is another universal design concept. How many internal components does your body have, as an order of magnitude? Your car? Your laptop? For each, the answer is about 10 because of combinatorics. If the system supports the required behaviors by combining the 10 or so components, a staggering number of such combinations becomes possible even without allowing repetition of participating components or partial sets. As a result, even a small number of valid internal components can support an astronomical number of possible use cases.
回顾设计良好的软件系统,系统的组件封装了易变的区域。使用方法,即使在大型系统中,您通常也会查看两到五个管理器、两到三个引擎、三到八个资源访问和资源以及六个实用程序。构建块的总数最多为二十几个。如果构建块的数量超过这个数字,您必须将系统分解为逻辑上相关的子集(子系统),这些子集的大小更易于管理。一旦您想不出更小的构建块集,就说明您找到了最佳设计。即使更优秀的架构师能想出更小的组件集,这也无关紧要,因为您的系统不是由其他架构师设计的。每项设计工作总有一个收益递减点,而最小的组件集就是您的收益递减点。
Going back to well-designed software systems, the components of the system encapsulate areas of volatility. Using The Method, even in a large system you are commonly looking at two to five Managers, two to three Engines, three to eight ResourceAccess and Resources, and a half-dozen Utilities. The total number of building blocks will be a dozen or two at the most. With anything larger than this number, you will have to break up the system into logically related subsets (subsystems) that are more manageable in size. Once you cannot think of a smaller set of building blocks, you have found your best design. It does not matter that a better architect could have come up with an even smaller set of components, because that other architect is not designing your system. Every design effort always has a point of diminishing return, and your smallest set is your point.
你可能要花几周甚至几个月的时间来确定核心用例和波动区域。然而,这不是设计——这是需求收集和需求分析,这确实可能非常耗时。一旦你有了确定了核心用例和易变性领域后,使用该方法需要多长时间才能制作出有效的设计?您也可以在这里使用数量级:更像是一个小时?一天?一周?一个月?一年?我希望本书的大多数读者会选择一天或一周,通过练习,您可以将时间缩短到几个小时。如果您知道自己在做什么,设计并不耗时。
You may spend weeks or months trying to identify the core use cases and the areas of volatility. However, that is not design—that is requirements gathering and requirements analysis, which may be very time-consuming indeed. Once you have settled on the core use cases and the areas of volatility, how long will it take you to produce a valid design using The Method? You can use orders of magnitude here as well: Is it more like an hour? A day? A week? A month? A year? I expect most readers of this book to pick a day or a week, and with practice you can bring the time down to a few hours. Design is not time-consuming if you know what you are doing.
将本章与前两章的观察结果结合起来,可以揭示出这一基本的系统设计规则:
Putting together the observations of this chapter along with the previous two chapters reveals this fundamental system design rule:
功能始终是集成的方面,而不是实现的方面。
Features are always and everywhere aspects of integration, not implementation.
这是一条通用的设计规则,它指导着所有系统的设计和实现。正如第 2 章所提到的,“通用”一词的本质也包括软件系统。
This is a universal design rule which governs the design and implementation of all systems. As mentioned in Chapter 2, the very nature of the word “universal” includes software systems.
汽车制造过程就是这条规则的一个简单证明。你的汽车有一个关键功能:它必须把你从 A 地运送到 B 地。如果你观察汽车是如何制造的,你会在什么时候看到这个功能?当你将底盘与发动机缸体、变速箱、座椅、仪表板、驾驶员、道路、保险和燃料整合在一起时,这个功能就会出现。将所有这些整合在一起就产生了这个功能。
The process of building automobiles is a simple demonstration of this rule. Your car has a crucial feature: It must transport you from location A to location B. If you were to observe how a car is manufactured, when would you see this feature? The feature emerges once you have integrated the chassis with the engine block, the gear box, the seats, the dashboard, a driver, a road, insurance, and fuel. Integrating all of these yields the feature.
这条规则更令人印象深刻的是它是一个分形。例如,我现在正在笔记本电脑上输入这本书的手稿,这为我提供了一项非常重要的功能:文字处理。但是,笔记本电脑的架构中是否有一个叫做的框Word Processing?笔记本电脑通过集成键盘、屏幕、硬盘、总线、CPU 和内存来提供文字处理功能。这些组件中的每一个也都提供一个功能:CPU 提供计算,硬盘提供存储。但是,如果您检查存储功能,驱动器的设计中是否有一个单独的块叫做Storage?硬盘通过集成内部组件(如内存、内部数据总线、介质、电缆、端口、电源调节器以及将所有东西固定在一起的小螺丝)来提供存储功能。螺丝本身提供了一个非常重要的功能:紧固。但是螺丝如何提供紧固功能呢?螺丝通过集成螺丝头、螺纹和螺丝杆来实现紧固。这些的集成提供了紧固功能。你可以按照这种方式一直钻研到夸克,但你永远不会看到任何特征。
What is even more impressive with this rule is that it is a fractal. For example, I am typing the manuscript for this book right now on a laptop, which provides me with a very important feature: word processing. But is there any box in the architecture of the laptop called Word Processing? The laptop provides the feature of word processing by integrating the keyboard, the screen, the hard drive, the bus, the CPU, and the memory. Each of these components provides a feature, too: The CPU provides computation, and the hard drive provides storage. Yet if you examine the feature of storage, is there a single block in the drive’s design called Storage? The hard drive provides the storage feature by integrating internal components such as memory, the internal data bus, media, cables, ports, power regulators, and small screws that hold everything together. The screws themselves provide a very important feature: fastening. But how does a screw provide fastening? The screw performs the fastening by integrating the head of the screw, the thread, and its stem. The integration of these provides the fastening feature. You can keep drilling down this way all the way to the quarks, and you will never see a feature.
再读一遍刚刚给出的设计规则。如果你仍然觉得难以接受,那么你已被插入一个矩阵,该矩阵告诉你编写实现某个功能的代码。这样做违背了宇宙的实际组合方式。没有勺子功能。
Read the design rule just given a second time. If you still find it hard to accept, you have been plugged into a matrix that is telling you to write code that implements a feature. Doing so goes against the way the universe is actually put together. There is no spoonfeature.
您的软件系统必须响应需求的变化。大多数软件系统都是使用功能分解来实现的,这样可以最大限度地发挥变更的效果。如果设计是基于功能的,那么根据定义,变更永远不会集中在一个地方。相反,它会分散在系统的多个组件和方面。使用功能分解,变更既昂贵又痛苦,因此人们会尽最大努力通过推迟变更来避免痛苦。他们会将变更请求添加到下一个半年发布版本中,因为他们宁愿承受未来的痛苦也不愿承受当前的痛苦。他们甚至可能直接反对变更,向客户解释请求的变更是一个坏主意。
Your software system must respond to changes in the requirements. Most software systems are implemented using functional decomposition, which maximizes the effects of the change. If the design has been based on features, the change, by definition, is never in one place. Instead, it is spread across multiple components and aspects of the system. With functional decomposition, change is expensive and painful so people do their best to avoid the pain by deferring the change. They will add the change request to the next semi-annual release because they prefer to take future pain over present pain. They may even fight the change outright by explaining to the customer that the requested change is a bad idea.
不幸的是,抵制变革就等于毁掉这个系统。活跃系统是客户使用的系统,而死系统是客户不使用的系统(即使他们仍为之付费)。当开发人员告诉客户该变革将在未来版本中发布时,他们希望客户在开发人员推出所要求的变革之前的六个月内做什么?客户不需要六个月后才需要该功能 — 他们现在就需要该功能。因此,客户将不得不通过使用遗留系统、某些外部媒介或竞争产品来解决这个问题。由于抵制变革会导致客户不再使用系统,因此抵制变革就是毁掉这个系统。响应变革的重要组成部分就是快速响应,即使开发人员从未明确说明过这一点。
Unfortunately, fighting the change is tantamount to killing the system. Live systems are systems that customers use, and dead systems are systems that customers do not use (even if they still pay for them). When developers tell customers that the change will be part of a future release, what do they expect the customers to do in the subsequent six months until the developers roll out the requested change? The customers do not want the feature six months in the future—they need the feature now. Consequently, the customers will have to work around the system by using the legacy system, or some external medium, or a competing product. Since fighting the change results in pushing customers away from using the system, fighting the change is killing the system. Part and parcel of responding to the change is responding quickly, even if that aspect was never explicitly stated.
应对变化的诀窍不是与之对抗、推迟或完全逃避。诀窍在于控制其影响。考虑使用基于波动性的分解和第 3 章的结构指南设计的系统。对需求的更改实际上是对系统所需行为的更改 — 具体来说,是对用例的更改。在方法中,某个管理器实现执行用例的工作流。管理器可能会受到更改的严重影响。也许您甚至需要丢弃该管理器的整个实现并在其位置创建一个新的。但是,管理器集成的底层组件不会受到对所需行为的更改的影响。
The trick to addressing change is not to fight it, postpone it, or punt it altogether. The trick is containing its effects. Consider a system designed using volatility-based decomposition and the structure guidelines of Chapter 3. A change to a requirement is actually a change to the required behavior of the system—specifically, a change to a use case. In The Method, some Manager implements the workflow executing the use case. The Manager may be gravely affected by a change. Perhaps you even need to discard the whole implementation of that Manager and create a new one in its place. However, the underlying components that the Manager integrates are not affected by the change to the required behavior.
回想一下第 3 章,管理器应该是几乎可消耗的。这使您能够吸收变更的成本,以控制变更。此外,任何系统中的大部分工作通常都投入到管理器使用的服务中:
Recall from Chapter 3 that the Manager should be almost expendable. This enables you to absorb the cost of the change, to contain it. Furthermore, the bulk of the effort in any system typically goes into the services that the Manager uses:
实施引擎的成本很高。每个引擎都代表着对系统工作流至关重要的业务活动,并封装了相关的波动性和复杂性。
Implementing Engines is expensive. Each Engine represents business activities vital to the system’s workflows and encapsulates the associated volatility and complexity.
实现ResourceAccess并非易事,这不仅仅是因为编写ResourceAccess代码的成本。识别原子业务动词、将它们转换为某些Resource的访问方法并将它们公开为Resource中立的接口也需要付出巨大努力。
Implementing a ResourceAccess is nontrivial, and not just because of the cost of writing the ResourceAccess code. Identifying the atomic business verbs, translating them into the access methodologies for some Resource, and exposing them as a Resource-neutral interface also takes a significant effort.
设计和实现可扩展、可靠、高性能且可重复使用的资源非常耗时耗力。这些任务可能包括设计数据契约、架构、缓存访问策略、分区、复制、连接管理、超时、锁定管理、索引、规范化、消息格式、事务、传递失败、毒药消息等等。
Designing and implementing Resources that are scalable, reliable, highly performant, and very reusable is time- and effort-consuming. These tasks may include designing data contracts, schemas, cache access policies, partitioning, replication, connection management, timeouts, lock management, indexing, normalization, message formats, transactions, delivery failures, poison messages, and much more.
实现实用程序始终需要顶级技能,而且结果必须值得信赖。实用程序是系统的支柱。世界一流的安全性、诊断、日志记录、消息处理、仪表和托管并非偶然发生。
Implementing Utilities always requires top skills, and the result must be trustworthy. Utilities are the backbone of your system. World-class security, diagnostics, logging, message processing, instrumentation, and hosting do not happen accidentally.
为客户设计出色的用户体验或方便且可重复使用的 API需要耗费大量时间和人力。客户还必须与管理器进行交互和集成。
Designing a superior user experience or a convenient and reusable API for Clients is time and labor intensive. The Clients also have to interface and integrate with the Managers.
当Manager发生变更时,您可以挽救并重用在Clients、Engines、ResourceAccess、Resources和Utilities上所做的所有工作。通过在 Manager 中重新集成这些服务,您已控制了变更,并可以快速高效地响应变更。这难道不是敏捷的本质吗?
When a change happens to the Manager, you get to salvage and reuse all the effort that went into the Clients, the Engines, the ResourceAccess, the Resources, and the Utilities. By reintegrating these services in the Manager, you have contained the change and can quickly and efficiently respond to changes. Is that not the essence of agility?
前三章介绍了系统设计的通用设计原则。然而,大多数人通过示例学习效果最好。因此,本章通过一个全面的例子展示了前几章中概念的应用:一个案例研究。该案例研究描述了一个名为 TradeMe 的新系统的设计,该系统用于替代遗留系统。该案例研究直接源自 IDesign 为其一位客户设计的实际系统,尽管特定的业务细节被清除和模糊了。从业务案例到分解,系统的本质保持不变:我没有掩盖问题或试图美化情况。正如第 1 章所提到的,设计不应该耗费时间。在这种情况下,一个由经验丰富的 IDesign 架构师和学徒组成的两人设计团队在不到一周的时间内完成了设计。
The previous three chapters presented the universal design principles for system design. However, most people learn best by example. Therefore, this chapter demonstrates the application of concepts from prior chapters with a comprehensive example: a case study. The case study describes the design of a new system called TradeMe, a replacement for a legacy system. The case study is derived directly from an actual system that IDesign designed for one of its customers, albeit with the specific business details scrubbed and obfuscated. The essence of the system remains unchanged, from the business case to the decomposition: I have not glossed over issues or tried to beautify the situation. As mentioned in Chapter 1, design should not be time-consuming. In this case, the design was completed in less than a week by a two-person design team consisting of a seasoned IDesign architect and an apprentice.
本案例研究的目的是展示用于设计过程的思维过程和推论。这些通常很难自己学习,但通过观察别人做这件事并推理正在发生的事情,更容易理解。本章首先概述客户和系统,然后以几个用例的形式介绍需求。对波动区域和架构的识别依赖于方法结构。
The goal of this case study is to show the thought process and the deductions used to produce the design. These are often difficult to learn on your own, but are more easily understood by watching somebody else do it while reasoning about what is taking place. This chapter starts with an overview of the customer and the system, then presents the requirements in the form of several use cases. The identification of the areas of volatility and the architecture relies on The Method structure.
TradeMe 是一个将工匠与承包商和项目进行匹配的系统。工匠可能是水管工、电工、木匠、焊工、测量员、油漆工、电话网络技术员、园丁和太阳能电池板安装工等。他们都独立工作,是自雇人士。每个工匠都有技能水平,有些工匠,如电工,由监管机构认证可以执行某些任务。工匠的报酬率因各种因素而异,例如纪律(焊工的报酬高于木匠)、技能水平、工作年限、项目类型、地点,甚至天气。影响他们工作的其他因素包括监管合规问题(如最低工资或就业税)、风险溢价(如摩天大楼或高压的外部工作)、工匠对某些任务的资格认证(如焊接大梁或电网接入)、报告要求等。
TradeMe is a system for matching tradesmen to contractors and projects. Tradesmen may be plumbers, electricians, carpenters, welders, surveyors, painters, telephone network technicians, gardeners, and solar panel installers, among others. They all work independently and are self-employed. Each tradesman has a skill level, and some, such as electricians, are certified by regulators to do certain tasks. The payment rate for the tradesman varies based on various factors such as discipline (welders are paid more than carpenters), skill level, years of experience, project type, location, and even weather. Other factors affecting their work include regulatory compliance issues (such as minimum wage or employment taxes), risk premium (such as exterior work on skyscrapers or with high voltage), certification of tradesmen’s qualifications for certain kinds of task (such as welding girders or power grid tie-in), reporting requirements, and more.
承包商是总承包商,他们需要临时工,时间从一天到几周不等。承包商通常拥有一支由通才组成的基地团队,全职雇用系统外的通才,使用 TradeMe 进行专业工作。在同一个项目中,不同的时间需要不同的工匠(一个工匠工作一天,另一个工匠工作一周)。工匠可以在一个项目中来来去去。
The contractors are general contractors, and they need tradesmen on an ad hoc basis, from as little as a day to as long as a few weeks. Contractors often have a base crew of generalists whom they employ outside the system on a full-time basis, using TradeMe for the specialized work. On the same project, different tradesmen are needed for different periods of time (one for a day, another for a week) at different times. Tradesmen can come and go on a single project.
TradeMe 系统允许工匠注册,列出他们的技能、他们可提供服务的大致地理区域以及他们期望的工资。它还允许承包商注册,列出他们的项目、所需的行业和技能、项目位置、他们愿意支付的工资、聘用期限以及项目的其他属性。承包商甚至可以请求(但不是坚持)他们想要合作的特定工匠。
The TradeMe system allows tradesmen to sign up, list their skills, their general geographic area of availability, and the rate they expect. It also allows contractors to sign up, list their projects, the required trades and skills, the location of the projects, the rates they are willing to pay, the duration of engagement, and other attributes of the project. Contractors can even request (but not insist upon) specific tradesmen with whom they would like to work.
除了已经提到的因素之外,承包商愿意支付的费率取决于供求关系。当项目闲置时,承包商会提高价格。当商人闲置时,商人会降低价格。对持续时间或要求的承诺也给予了类似的考虑。对于商人来说,理想的项目通常支付高费率并且持续时间短。一旦商人承诺了一个项目,他们就必须坚持到他们承诺的时间。承包商可能会提供更高的报酬和更长的承诺。一般来说,该系统让市场力量设定费率并找到平衡。
Other than the factors already mentioned, the rate the contractor is willing to pay depends on supply and demand. When a project is idle, the contractor will increase the price. When the tradesman is idle, the tradesman will lower the price. Similar consideration is given to the duration or requested commitment. The ideal project for a tradesman often pays a high rate and has a short duration. Once the tradesmen have committed to a project, they have to stay for the amount of time to which they committed. Contractors may offer more pay with longer commitments. In general, the system lets market forces set the rate and find equilibrium.
这些项目都是建筑工程。该系统在新兴市场也可能有用,例如油田或船舶码头。
The projects are construction projects for buildings. The system may also be useful in newly emerging markets, such as oil fields or marine yards.
TradeMe 让工匠和承包商能够互相联系。系统处理请求,并将所需工匠派往工作地点。它还会记录工时和工资,并向当局报告其余事项,这样承包商和工匠就无需亲自处理这些任务了。
TradeMe allows the tradesmen and the contractors to find one another. The system processes the requests and dispatches the required tradesmen to the work sites. It also keeps track of the hours and wages, and the rest of the reporting to the authorities, saving both contractors and tradesmen the hassle of handling these tasks themselves.
该系统将商人与承包商隔离开来。它从承包商那里收取资金并支付给商人。承包商无法绕过该系统直接雇用商人,因为商人在系统中具有排他性。
The system isolates tradesmen from contractors. It collects funds from the contractors and pays the tradesmen. Contractors cannot bypass the system and hire the tradesmen directly because the tradesmen have exclusivity with the system.
TradeMe 系统旨在为商人找到最优惠的价格,为承包商提供最大的可用性。它通过要价和投标价之间的微小差价赚钱。另一个收入来源是商人和承包商支付的会员费。该费用每年收取一次,但可能会发生变化。因此,商人和承包商都被称为系统成员。
The TradeMe system aims to find the best rate for the tradesmen and the most availability for the contractors. It makes money on the small spread between the ask rate and the bid rate. Another source of income is the membership fee that both tradesmen and contractors pay. The fee is collected annually but that could change. Consequently, both tradesmen and contractors are called members in the system.
目前,九个呼叫中心负责处理大部分任务。每个呼叫中心都针对特定的地区、法规、建筑规范、标准和劳动法。呼叫中心配备了客户代表,称为销售代表。如今,销售代表依靠经验来优化所有项目和可用工匠的调度。一些呼叫中心作为自己的企业运营,而另一些则由同一企业运营。
Presently, nine call centers handle the majority of the assignments. Each call center is specific to a particular locale, regulations, building codes, standards, and labor laws. The call centers are staffed with account representatives called reps. The reps today rely on experience to optimize the scheduling across all projects and available tradesmen. Some call centers operate as their own business, whereas others are operated by the same business.
至少还有一款竞争应用程序更倾向于寻找最便宜的工匠,一些承包商更喜欢该系统。承包商根据价格而不是可用性来选择工匠可能会成为一种日益增长的趋势。
There is also at least one competing application geared more toward finding the cheapest tradesmen, and some contractors prefer that system. Contractors opting for tradesmen based on price as opposed to availability could be a growing trend.
部署在欧洲呼叫中心的旧系统拥有全职用户,他们依赖于连接到数据库的两层桌面应用程序。商人和承包商都会打电话进来,代表输入详细信息,甚至实时进行匹配。一些用于管理会员的基本 Web 门户绕过旧系统,直接与数据库一起工作。各种子系统是孤立的,效率非常低,几乎每一步都需要大量的人工干预。用户需要使用多达五个不同的应用程序来完成他们的任务。这些应用程序是独立的,集成是手动完成的。客户端应用程序充满了业务逻辑,UI 和业务逻辑之间缺乏分离,阻碍了应用程序更新到现代用户体验。
The legacy system, which is deployed in European call centers, has full-time users who rely on a two-tier desktop application connected to a database. Both tradesmen and contractors call in, with the reps entering the details and even performing the matching in real time. Some rudimentary web portals for managing membership bypass the legacy system and work with the database directly. The various subsystems are isolated and very inefficient, requiring a lot of human intervention at almost every step. Users are required to employ as many as five different applications to accomplish their tasks. These applications are independent, and the integration is done by hand. The client applications are chock-full of business logic, and the lack of separation between UI and business logic prevents updating the applications to modern user experience.
每个子系统甚至都有自己的存储库,用户必须协调它们才能理解所有内容。这个过程很容易出错,并且会给新用户带来昂贵的培训和入门时间。
Each subsystem even has its own repository, and the users have to reconcile them to make sense of it all. This process is error prone and imposes expensive training and onboarding time for new users.
遗留系统存在漏洞,其随意的安全方法使其面临许多可能的攻击媒介。遗留系统在设计时从未考虑过安全性。事实上,它根本没有设计,而是自然发展起来的。
The legacy system is vulnerable, and its haphazard approach to security exposes it to many possible attack vectors. The legacy system was never designed with security in mind. For that matter, it was never designed at all, but rather grew organically.
旧版本根本无法容纳以下几个新特性和所需的能力:
The legacy simply cannot accommodate several new features and desirable capabilities:
移动设备支持
Mobile device support
工作流程自动化程度更高
Higher degree of automation of the workflow
与其他系统的一些连接
Some connectivity to other systems
迁移到云
Migration to the cloud
欺诈检测
Fraud detection
工作质量调查,包括将工匠的安全记录纳入工资和技能水平
Quality of work surveys, including incorporating the tradesman’s safety record in the rate and skill level
进入新市场(例如在船厂部署)
Entering new markets (such as deployment at marine yards)
企业和用户都对旧系统无法与时俱进感到沮丧,而且人们总是在不断地寻求增值功能。其中一项功能,即继续教育,原来是必备的,所以它被拼凑在旧系统之上。旧系统将工匠分配到认证课程和政府要求的测试,并跟踪工匠的进度。虽然外部教育中心提供培训并注册认证,但用户必须手动将它们与旧系统连接起来。虽然与核心系统方面无关,但工匠和企业都非常喜欢这个功能,因为认证功能有助于防止工匠转向竞争对手。
Both the business and users are frustrated with the legacy system’s inability to keep up with the times, and there is a never-ending stream of desired value-added features. One such feature, continuing education, turned out to be a must-have, so it was cobbled on top of the legacy system. The legacy system assigns tradesmen to certification classes and government-required tests and tracks the tradesmen’s progress. Although external education centers provide training and register the certifications, the users have to manually connect them with the legacy system. While unrelated to the core system aspects, tradesmen are really keen on this feature, as is the business, because the certification feature helps prevent tradesmen from moving to the competitors.
旧系统难以遵守各个地区的新法规。处理任何变化都非常困难,而且该系统高度针对其当前的业务环境。由于该公司无法负担为每个地区支持不同版本的系统,因此它产生了将系统简化为各个地区最低标准的动机。这进一步增加了用户的手动工作流程负担,从而降低了效率,增加了培训时间和成本,并导致业务机会的损失。
The legacy system is having trouble complying with new legislation across locales. Dealing with any change is very difficult, and the system is highly specific for its current business context. Since the company cannot afford to support a unique version of the system per locale, it created an incentive to dumb down the system to the lowest common denominator across locales. This further increases the burden on the users in terms of their manual workflows, which decreases efficiency, increases training time and costs, and causes loss of business opportunities.
总体而言,该系统在所有地点拥有约 220 名代表。可扩展性和吞吐量都不是问题。但是,响应能力是一个问题,尽管这可能只是遗留系统的一个副作用。
Overall, the system has some 220 reps across all locations. Neither scalability nor throughput poses a problem. However, responsiveness is an issue, although this may just be a side effect of the legacy system.
鉴于设计不良的遗留系统存在的问题,公司管理层希望设计一个正确的新系统。新系统应尽可能地实现工作自动化。理想情况下,公司希望拥有一个小型呼叫中心,用作自动化流程的备份。该呼叫中心将在所有地区使用单一系统。虽然该系统部署在欧洲,但也有要求在英国1甚至加拿大(即欧盟以外)部署该系统。投资新系统的另一个驱动因素是竞争对手的系统更加灵活、高效,用户体验也更出色。
Given the issues of the poorly designed legacy, the company’s management is interested in designing a new system correctly. The new system should automate the work as much as possible. Ideally, the company would like to have a single, small call center that is used as a backup for an automated process. This call center would use a single system across all locales. While the system is deployed in Europe, there are requests to deploy the system in the United Kingdom1 and even Canada (i.e., outside the European Union). Another driver for investing in the new system is that the competitors have much more flexible, efficient systems, with superior user experience.
1.虽然这一设计工作是在英国脱欧(英国脱离欧盟)之前进行的,但英国脱欧是一个典型例子,它是一种当时未曾预料到的大规模变革,但新系统却无缝地适应了它。
1. While this design effort took place prior to Brexit (the departure of the United Kingdom from the EU), Brexit is a classic example of a massive change that was unanticipated at the time, yet the new system accommodated it seamlessly.
虽然承包商可以使用多种来源的工匠(包括竞争产品)为项目配备人员,但与竞争产品的集成以及项目优化通常超出了该系统的范围:该公司不从事优化或集成业务。将市场扩大到包括 IT 或护理等其他行业也超出了范围。增加这些市场将重新定义业务的性质,而该公司的强项是将工匠与建筑项目相匹配,而不是一般的人员配备。
While contractors could staff a project using multiple sources of tradesmen, including competing products, integration with competing products and project optimizations in general are beyond the scope of the system: The company is not in the business of optimization or integration. Expanding the marketplace to include other trades such as IT or nursing is also out of scope. Adding these markets would redefine the nature of the business, and the company’s forte is matching tradesmen to construction projects, not general staffing.
该公司将自己视为贸易经纪人,而不是软件组织。软件不是其业务。过去,该公司没有认识到开发优秀软件真正需要什么。该公司没有投入足够的精力进行流程或开发实践。该公司过去建立替代系统的尝试都失败了。该公司确实拥有充足的财务资源——遗留应用程序非常有利可图。过去的惨痛教训让管理层决定翻开新的一页,采用合理的软件开发方法。
The company views itself as a tradesmen broker, not as a software organization. Software is not its business. In the past the company did not acknowledge what it would really take to develop great software. The company did not devote adequate effort to process or development practices. The company’s attempts to build a replacement system in the past all failed. What the company does have is plenty of financial resources—the legacy application is very profitable. The bitter lessons of the past have convinced management to turn a new page and adopt a sound approach for software development.
旧系统和新系统均没有现成的需求文档。客户能够提供图 5-1至5-8,描述一些用例。这些用例可能是也可能不是核心用例;它们只是系统所需的行为。在很大程度上,这些用例反映了遗留系统应该做的事情。由于设计团队正在寻找核心用例,他们忽略了低级用例,例如输入财务详细信息、向承包商收取费用以及向商人支付款项。一些用例,例如继续教育,甚至没有指定。此外,显然还有空间可以添加其他用例来补充公司提供的用例。
There were no existing requirements documents for either the old system or the new system. The customer was able to provide Figures 5-1 to 5-8, depicting some use cases. These may or may not be core use cases; they are simply the required behaviors of the system. To a large extent, the use cases reflected what the legacy system was supposedly doing. Since the design team was looking for core use cases, they ignored low-level use cases such as entering financial details, collecting fees from contractors, and distributing payments to tradesmen. Some use cases, such as continuing education, were not even specified. Moreover, there was clearly room for additional use cases complementing the use cases provided by the company.
图 5-1添加工匠或承包商用例
Figure 5-1 Add Tradesman or Contractor use case
图 5-2请求工匠或承包商用例
Figure 5-2 Request Tradesman or Contractor use case
图 5-3 Match Tradesman 用例
Figure 5-3 Match Tradesman use case
图 5-4指派商人用例
Figure 5-4 Assign Tradesman use case
图 5-5终止 Tradesman 用例
Figure 5-5 Terminate Tradesman use case
图 5-6支付商人用例
Figure 5-6 Pay Tradesman use case
图 5-7创建项目用例
Figure 5-7 Create Project use case
图 5-8关闭项目用例
Figure 5-8 Close Project use case
公司提供的大多数用例看起来都不像核心用例,而只是一些简单功能的列表。回想一下,核心用例代表了业务的本质。系统的本质不是添加工匠或承包商、创建项目或向工匠付款。所有这些任务都可以通过多种方式完成;它们几乎没有增加业务价值,也没有使系统与竞争对手区分开来。相反,系统存在的理由在开头的一句话定义中给出:“TradeMe 是一个将工匠与承包商和项目进行匹配的系统。”唯一与这一点有相似之处的用例是 Match Tradesman 用例(图 5-3)。
Most of the company-provided use cases did not look like core use cases, but rather appeared to be just a list of simple functionalities. Recall that a core use case represents the essence of the business. The essence of the system is not to add a tradesman or contractor, to create a project, or to pay a tradesman. All of these tasks may be done in any number of ways; they add little business value and do not differentiate the system from the competition. Instead, the system’s raison d’être is given in the opening one-sentence definition: “TradeMe is a system for matching tradesmen to contractors and projects.” The only use case with any semblance to that point was the Match Tradesman use case (Figure 5-3).
客户很少会以有用的格式提出需求,更不用说以有利于良好设计的方式提出需求。您必须始终转换、澄清和整合原始数据。在设计过程的早期,您甚至可能识别出交互区域,这将使以后将这些区域映射到子系统或层变得更加自然。例如,对于 TradeMe,所有用例中至少有三种类型的角色:用户、市场和会员。用户可以是后台数据输入代表或系统管理员。也许只有管理员才能解雇商人,但图 5-5中没有提供该信息。
Customers rarely ever present the requirements in a useful format, let alone in a way that is conducive to good design. You must always transform, clarify, and consolidate the raw data. Early in the design process you may even recognize areas of interaction that will later make mapping the areas to subsystems or layers much more natural. For example, with TradeMe, there were at least three types of roles across all the use cases: the users, the market, and the members. The users can be back-office data entry reps or system administrators. Perhaps only an administrator can terminate a tradesman, but that information was absent from Figure 5-5.
在活动图中使用“泳道”来显示角色、组织和其他负责实体之间的控制流很有用。例如,图 5-9提供了另一种表达图 5-5中的 Terminate Tradesman 用例的方法。
It is useful to show the flow of control between roles, organizations, and other responsible entities, using “swim lanes” in your activity diagrams. For example, Figure 5-9 provides an alternative way of expressing the Terminate Tradesman use case from Figure 5-5.
图片 5-9使用泳道细分活动图
Figure 5-9 Subdividing the activity diagram with swim lanes
通过将活动图细分为交互区域,可以转换原始用例。这还有助于通过根据需要添加决策框或同步栏来明确系统所需的行为。您将在本章后面看到如何使用泳道技术来启动和验证设计。
You transform the raw use case by subdividing the activity diagram into areas of interactions. This also helps clarify the required behavior of the system by adding decision boxes or synchronization bars as required. You will see how to use the swim lanes technique later on in this chapter to both initiate and validate the design.
第 2 章提到,反设计工作是一种有效的技术,它通过故意尝试设计最糟糕的系统来阻止人们进行功能分解。虽然良好的反设计工作会产生有效的设计,因为它支持用例,但它不提供封装并且表现出紧密耦合。这种设计对其他人来说通常感觉很自然(即他们会制作类似的东西)。反设计很可能是某种功能分解。
Chapter 2 mentioned an anti-design effort as an effective technique to sway people from functional decomposition by deliberately trying to design the worst possible system. While a good anti-design effort produces a valid design because it supports the use cases, it offers no encapsulation and demonstrates tight coupling. Such a design often feels natural to others (i.e., they would have produced something similar). Odds are that the anti-design will be some flavor of functional decomposition.
一个简单的反设计示例是上帝服务——一个丑陋的垃圾场,将所有需求中的功能都放在一个地方实现。虽然这种设计非常常见,甚至有一个名字(Monolith),但现在大多数人已经从惨痛的经历中吸取了教训,不要这样设计。
One simple anti-design example is a god service—an ugly dumping ground of all the functionalities in the requirements, all implemented in one place. While this design is so common it even has a name (the Monolith), by now most people have learned the hard way not to design this way.
图 5-10显示了反设计的另一种表现形式:大量的构建块。用例中的每个活动实际上在架构中都有相应的组件。数据库访问或数据库本身均未封装。
Figure 5-10 shows another take on the anti-design: a massive set of building blocks. Literally every activity in the use cases has a corresponding component in the architecture. There is no encapsulation of either the database access or the database itself.
图5-10服务防爆设计
Figure 5-10 Services explosion anti-design
有了这么多的细粒度块,客户端就负责实现用例的业务逻辑,如图 5-11所示。将客户端的代码与业务逻辑合并,导致客户端臃肿,整个系统迁移到客户端上,如图2-1所示。
With so many fine-grained blocks, the Clients become responsible for implementing the business logic of the use cases, as shown in Figure 5-11. Contaminating the Client’s code with business logic results in a bloated Client in which the entire system migrates to the Client, as shown in Figure 2-1.
图5-11被污染和膨胀的客户端
Figure 5-11 Polluted and inflated Client
或者,您可以让服务相互调用,如图5-12所示。但是,以这种方式将功能强大的服务链接在一起会导致它们之间产生耦合,如图 2-5所示。还请注意图 5-12中向上和横向调用的开放架构问题。
Alternatively, you can have the services call each other, as shown in Figure 5-12. However, chaining the highly functional services together in this way creates coupling between them, as depicted in Figure 2-5. Note also in Figure 5-12 the open architecture issues of calling up and sideways.
图 5-12链接服务反设计
Figure 5-12 Chaining the services anti-design
另一个经典的反设计是沿域线分解,如图5-13所示。这里系统沿Tradesman、Contractor和的域线分解Project。
Another classic anti-design is to decompose along the domain lines, as shown in Figure 5-13. Here the system is decomposed along the domain lines of Tradesman, Contractor, and Project.
图5-13领域分解反设计
Figure 5-13 Domain decomposition anti-design
即使是TradeMe 这样相对简单的系统,域名分解的可能性也几乎是无限的,例如,,,,,,,,,,,,,,,,,等等。谁能说Accounts比更适合Administration做域名?应该使用哪些标准来做出判断?AnalyticsApprovalAssignmentCertificatesContractsCurrencyDisputesFinanceFulfillmentLegislationPayrollReportsRequisitionStaffingSubscriptionProjectAccounts
Even with a relatively simple system such as TradeMe, there are nearly limitless additional possibilities for domain decomposition, such as Accounts, Administration, Analytics, Approval, Assignment, Certificates, Contracts, Currency, Disputes, Finance, Fulfillment, Legislation, Payroll, Reports, Requisition, Staffing, Subscription, and so on. Who is to say that Project is a better candidate for a domain than Accounts? And which criteria should be used to make that judgment?
除了第 2 章中讨论的诸多缺点之外,域分解还使得通过展示用例支持来验证设计几乎不可能。例如,对工匠的请求将同时出现在Project和Tradesman域服务上。由于跨域功能重复,因此很难确定谁在何时做什么。
Besides having the many drawbacks discussed in Chapter 2, domain decomposition makes it nearly impossible to validate the design by demonstrating support of the use cases. For example, a request for a tradesman will appear on both the Project and Tradesman domain services. Due to the duplication of functionalities across domain lines, it is ambiguous who is doing what and when.
认识到架构的存在不是为了自身利益,这一点至关重要。架构(和系统)必须服务于业务。服务于业务是任何设计工作的指导方针。因此,您必须确保架构与业务的未来愿景和业务目标保持一致。此外,您必须具有从业务目标到架构的完全双向可追溯性。您必须能够轻松指出架构如何以某种方式支持每个目标,以及架构的每个方面如何从业务目标中衍生出来。替代方案是毫无意义的设计和孤立的业务需求。
It is of the utmost importance to recognize that the architecture does not exist for its own sake. The architecture (and the system) must serve the business. Serving the business is the guiding light for any design effort. As such, you must ensure that the architecture is aligned with the vision that the business has for its future and with the business objectives. Moreover, you must have complete bidirectional traceability from the business objectives to the architecture. You must be able to easily point out how each objective is supported in some way by the architecture, and how each aspect of the architecture is derived from some objectives of the business. The alternatives are pointless designs and orphaned business needs.
如前几章所述,设计架构师必须首先识别易变性领域,然后将这些领域封装在系统组件、操作概念和基础架构中。组件的集成支持所需的行为,而集成的方式则实现业务目标。例如,如果关键目标是可扩展性和灵活性,那么通过消息总线集成组件是一个很好的解决方案(稍后将详细介绍这一点)。相反,如果关键目标是性能和简单性,那么引入消息总线会增加过多的复杂性。
As discussed in the previous chapters, the architect producing the design has to first recognize the areas of volatility and then encapsulate these areas in system components, operational concepts, and infrastructure. The integration of the components is what supports the required behaviors, and the way the integration takes place is what realizes the business objectives. For example, if a key objective is extensibility and flexibility, then integrating the components over a message bus is a good solution (more on this point later). Conversely, if the key objective is performance and simplicity, introducing a message bus contributes too much complexity.
本章的其余部分详细介绍了将业务需求转化为 TradeMe 设计的步骤。这些步骤从捕捉系统愿景和业务目标开始,然后推动设计决策。
The rest of this chapter provides a detailed walkthrough of the steps that transform the business needs into a design for TradeMe. These steps start with capturing the system vision and the business objectives, which then drive the design decisions.
在任何环境中,很少有人会对系统应该做什么有相同的看法。有些人可能根本没有看法。其他人的看法可能与其他人不同,或者只服务于他们狭隘的利益。有些人可能会误解业务目标。TradeMe 背后的公司因未能跟上不断变化的市场而陷入了无数额外的问题。这些问题反映在现有系统、公司结构和软件开发设置方式中。新系统必须正面解决所有问题,而不是零敲碎打,因为只解决其中一些问题不足以取得成功。
Seldom will everyone in any environment share the same vision as to what the system should do. Some may have no vision at all. Others may have a different vision than the rest or a vision that serves only their narrow interests. Some may misinterpret the business goals. The company behind TradeMe was stymied by a myriad of additional issues resulting from its failure to keep up with the changing market. These issues were reflected in the existing systems, in the company’s structure, and in the way software development was set up. The new system had to tackle all the issues head-on rather than in a piecemeal fashion, because solving just some of them was insufficient for success.
首要任务是让所有利益相关者就共同愿景达成一致。愿景必须推动一切,从架构到承诺。你之后所做的一切都必须服务于这一愿景并以此为依据。当然,这是双向的——这就是为什么从愿景开始是个好主意。如果某件事不符合愿景,那么它通常与政治和其他次要或第三要务有关。这为你提供了一种极好的方式来拒绝不支持商定愿景的无关要求。在 TradeMe 的案例中,设计团队将愿景提炼为一句话:
The first order of business is to get all stakeholders to agree on a common vision. The vision must drive everything, from architecture to commitments. Everything that you do later has to serve that vision and be justified by it. Of course, this cuts both ways—which is why it is a good idea to start with the vision. If something does not serve the vision, then it often has to do with politics and other secondary or tertiary concerns. This provides you with an excellent way of repelling irrelevant demands that do not support the agreed-upon vision. In the case of TradeMe, the design team distilled the vision to a single sentence:
一个用于构建应用程序以支持 TradeMe 市场的平台。
A platform for building applications to support the TradeMe marketplace.
好的愿景既简洁又明确。你应该像读法律声明一样去阅读它。请注意,TradeMe 的愿景是构建一个平台来构建应用程序。这种平台思维解决了企业渴望的多样性和可扩展性,并且可能适用于您设计的系统。
A good vision is both terse and explicit. You should read it like a legal statement. Note that the vision for TradeMe was to build a platform on which to build the applications. This kind of platform mindset addressed the diversity and extensibility the business craved and may be applicable in systems you design.
在就愿景达成一致后(并且只有在那时),您可以将愿景逐项列出具体目标。您应该拒绝所有不符合愿景的目标;您应该包括所有对支持愿景至关重要的目标。这两种类型通常很容易挑选出来。列出目标时,您应该采用业务视角。您不能让工程或营销人员主导对话,也不能包括技术目标或具体要求。设计团队从 TradeMe 系统概述中提取了以下目标:
After agreeing on the vision (and only then), you can itemize the vision to specific objectives. You should reject all objectives that do not serve the vision; you should include all objectives that are essential to support the vision. These two types are usually easy to pick out. When you list objectives, you should adopt a business perspective. You must not allow the engineering or marketing people to own the conversation, or to include technology objectives or specific requirements. The design team extracted the following objectives from the TradeMe system overview:
统一存储库和应用程序。旧系统存在太多低效问题,需要大量人工干预才能使系统保持最新状态并正常运行。
Unify the repositories and applications. The legacy system had entirely too many inefficiencies, requiring a lot of human intervention to keep the system up to date and running.
快速满足新需求。传统功能的交付时间非常糟糕。新平台必须允许非常快速、频繁的定制,通常只针对特定技能、一周中的某个时间、项目类型以及这些的任意组合进行定制。理想情况下,从编码到部署,这种快速交付的大部分工作都应该实现自动化。
Quick turnaround for new requirements. The legacy turnaround time for features was abysmal. The new platform had to allow very fast, frequent customization, often tailored just for a specific skill, time of the week, project type, and any combination of these. Ideally, much of this quick turnaround should be automated, from coding to deployment.
支持跨国家和跨市场的高度定制。由于法规、立法、文化和语言的差异,本地化是一个令人难以置信的痛点。
Support a high degree of customization across countries and markets. Localization was an incredible pain point because of differences in regulations, legislations, cultures, and languages.
支持全面的业务可视性和责任制。传统系统不具备欺诈检测、审计跟踪和监控功能。
Supports full business visibility and accountability. Fraud detection, audit trails, and monitoring were nonexistent in the legacy system.
前瞻性地看待技术和法规。系统必须预测变化,而不是永远处于被动模式。该公司设想,这就是 TradeMe 击败竞争对手的方法。
Forward looking on technology and regulations. Instead of being in perpetual reactive mode, the system must anticipate change. The company envisioned that this was how TradeMe would defeat the competitors.
与外部系统良好集成。虽然与上一个目标有些关联,但这里的目标是实现以前费力的手动流程的高度自动化。
Integrate well with external systems. Although somewhat related to the previous objective, the objective here is to enable a high degree of automation over previously laborious manual processes.
简化安全性。系统必须得到妥善保护,每个组件的设计都必须考虑到安全性。为了实现安全目标,开发团队必须将安全审计等安全活动引入软件生命周期并在架构中支持它。
Streamline security. The system must be properly secured, and literally every component must be designed with security in mind. To meet the security objective, the development team must introduce security activities such as security audits into the software life cycle and support it in the architecture.
这可能令人惊讶,但阐明愿景(企业将获得什么)和目标(企业为何想要这个愿景)往往是不够的。人们通常过于纠结于细节而无法将各个点联系起来。因此,您还应该指定一个使命宣言(您将如何实现它)。TradeMe 的使命宣言是:
It may come as a surprise, but articulating the vision (what the business will receive) and the objectives (why the business wants the vision) is often insufficient. People are usually too mired in the details and cannot connect the dots. Thus, you should also specify a mission statement (how you will do it). The TradeMe Mission Statement was:
设计和构建一组软件组件,开发团队可以将其组装成应用程序和功能。
Design and build a collection of software components that the development team can assemble into applications and features.
这个使命声明故意没有将开发功能确定为使命。使命不是构建功能——使命是构建组件。现在,证明基于波动性的分解符合使命声明变得容易得多,因为所有点都是相互关联的:
This mission statement deliberately does not identify developing features as the mission. The mission is not to build features—the mission is to build components. It now becomes much easier to justify volatility-based decomposition that serves the mission statement because all the dots are connected:
愿景 → 目标 → 使命宣言 → 架构
Vision → Objectives → Mission Statement → Architecture
事实上,您刚刚迫使业务部门指示您设计正确的架构。这与典型的动态相反,在前者中,架构师恳求管理层避免功能分解。通过将架构与业务愿景、目标和使命陈述保持一致,可以更轻松地在整个业务中推动正确的架构。一旦您让他们就愿景、目标和使命陈述达成一致,他们就会站在您这边。如果您希望业务人员支持您的架构工作,您必须展示架构如何服务于业务。
In fact, you have just compelled the business to instruct you to design the right architecture. This is a reversal of the typical dynamics, in which the architect pleads with management to avoid functional decomposition. It is a lot easier to drive the correct architecture through the business by aligning the architecture with the business’s vision, its objectives, and the mission statement. Once you have them agree on the vision, the objectives, and then the mission statement, you have them on your side. If you want the business people to support your architecture effort, you must demonstrate how the architecture serves the business.
误解和混淆是软件开发中普遍存在的现象,经常会导致冲突或期望落空。营销人员可能对同一事物使用与工程人员不同的术语,甚至更糟的是,他们使用相同的术语但含义不同。这种歧义可能多年都未被发现。在深入系统设计之前,请编制一个简短的领域术语表,以确保每个人都在同一页面上。
Misunderstanding and confusion are endemic with software development and often lead to conflict or unmet expectations. Marketing may use different terms than engineering for the same thing or—even worse—may use the same term but mean a different thing. Such ambiguities may go undetected for years. Before you dive into the act of system design, ensure everyone is on the same page by compiling a short glossary of domain terminology.
开始编写词汇表的一个好方法是回答四个经典问题:“谁”、“什么”、“如何”和“在哪里”。您可以通过检查系统概述、用例和客户访谈记录(如果有)来回答这些问题。对于 TradeMe,这四个问题的答案如下:
A good way of starting a glossary is to answer the four classic questions of “who,” “what,” “how,” and “where.” You answer the questions by examining the system overview, the use cases, and customer interview notes, if you have any. For TradeMe, the answers to the four questions were as follows:
WHO
– 商人
– 承包商
– TradeMe 代表
– 教育中心
– 后台进程(即支付调度程序)
Who
– Tradesmen
– Contractors
– TradeMe reps
– Education centers
– Background processes (i.e., scheduler for payment)
什么
– 商人和承包商的会员资格
– 建筑项目市场
– 继续教育证书和培训
What
– Membership of tradesmen and contractors
– Marketplace of construction projects
– Certificates and training for continuing education
如何
– 搜索
– 遵守法规
– 访问资源
How
– Searching
– Complying with regulations
– Accessing resources
在哪里
– 本地数据库
- 云
– 其他系统
Where
– Local database
– Cloud
– Other systems
回想一下第 3 章,您通常可以将这四个问题的答案映射到层,即使不是架构本身的组件。
Recall from Chapter 3 that you often can map the answers to the four questions to layers, if not to components of the architecture itself.
“什么”列表尤其有趣,因为它强烈暗示了可能的子系统或前面提到的泳道。在寻找易变区域时,您可以使用泳道和答案来播种和启动分解工作。这并不排除有额外的子系统,也不意味着这些子系统一定是所需的所有子系统——您总是根据易变性进行分解,如果“什么”不是易变的,那么它就不值得在架构中成为组件。此时,它提供的只是一个很好的起点来推理您的设计。
The list of the “what” is of particularly interest because it hints strongly at possible subsystems or the swim lanes mentioned previously. You can use the swim lanes and the answers to seed and initiate your decomposition effort as you look for areas of volatility. This does not preclude having additional subsystems or imply that these will necessarily be all the subsystems needed—you always decompose based on volatility, and if a “what” is not volatile, then it will not merit a component in the architecture. At this point all it provides is a nice starting point to reason about your design.
分解的本质在于识别前几章中概述的波动区域。以下列表重点介绍了 TradeMe 的一些候选波动性以及设计团队考虑的因素:
The essence of the decomposition is in identifying the areas of volatility as outlined in the previous chapters. The following list highlights a few of the candidate volatilities for TradeMe and the factors the design team considered:
商人。这是系统中波动的领域吗?如果您需要为商人添加属性,那么很难说架构(即使是纯功能架构)会受到很大影响。换句话说,商人是可变的,但不是波动的。对于商人的任何属性子集(例如技能集),情况也是如此。也许商人本身并不是波动的。也许存在一种更通用的波动性,例如会员管理或法规,与商人有密切关系。以这种方式讨论波动性候选者甚至挑战它们很重要。如果您不能清楚地说明波动性是什么、为什么波动性以及波动性在可能性和影响方面带来什么风险,那么您需要进一步研究。将商人确定为波动性领域表明沿域线分解(见图5-12)。
Tradesman. Is this an area of volatility in the system? It is hard to claim that the architecture, even a purely functional one, would suffer to a large extent if you need to add attributes to the tradesman. In other words, tradesman is variable but not volatile. This is also true for any subset of attributes of the tradesman (e.g., skill sets). Maybe the tradesman is not volatile in isolation. Perhaps there exists a more generic volatility, such as membership management or regulations, that has affinity with the tradesman. It is important to discuss the volatility candidates this way and even challenge them. If you cannot clearly state what the volatility is, why it is volatile, and what risk the volatility poses in terms of likelihood and effect, then you need to look further. Identifying tradesman as an area of volatility signals decomposition along domain lines (see Figure 5-12).
教育证书。认证过程是否不稳定?如果是,那么从业务和系统的角度来看,真正的波动性到底是什么?在这种情况下,波动性出现在将项目所需认证的法规与经过适当认证的商人相匹配的工作流程中。认证本身只是商人的一个属性。从企业的角度来看,认证管理永远是商人经纪的核心附加值的次要部分。
Education certificates. Is the certification process volatile? If so, what exactly is the true volatility from the point of view of the business and the system? In this case, the volatility arises in the workflow of matching the regulations governing required certifications for projects with appropriately certified tradesman. The certification itself is just an attribute of the tradesman. From the business’s perspective, certification management will forever be secondary to the core added value of being a tradesmen brokerage.
项目。项目波动性是否值得拥有自己的管理器?AProject Manager表示项目上下文。AMarket Manager更好,因为系统需要管理的某些活动可能不需要正在运行的项目上下文来执行。例如,您可以要求市场提出匹配,而无需考虑特定项目,或者匹配可能需要涉及多个项目。也许为了留住有价值的商人,您希望向商人支付一笔定金,而不管任何项目。将项目识别为波动性表现为域分解。核心波动性是市场,而不是项目。
Projects. Is project volatility deserving of its own Manager? A Project Manager implies a project context. A Market Manager is better because some activities that the system needs to manage may not require a context of a running project to execute. For example, you can ask the market to propose a match without having a specific project in mind, or maybe a match may require involving multiple projects. Perhaps to maintain a valuable tradesman you wish to pay the tradesman a retainer, irrespective of any project. Identifying projects as a volatility manifests as domain decomposition. The core volatility is the marketplace, not the projects.
提出某些波动性领域,然后检查最终架构,这并没有错。如果结果产生了蜘蛛网状的相互作用或不对称,那么设计就不太可能好。您可能会感觉到设计是否正确。
There is nothing wrong with suggesting certain areas of volatility, and then examining the resultant architecture. If the result produces a spiderweb of interactions or is asymmetric, then the design is unlikely to be good. You will probably sense whether the design is correct or not.
有时,波动区域可能位于系统之外。例如,虽然由于付款方式多种多样,付款很可能是一个波动区域,但 TradeMe 作为一个软件项目并不是要实施支付系统。付款是系统核心价值的附属品。系统可能会使用许多外部支付系统作为资源。资源可能是整个系统,每个系统都有自己的波动性,但这些超出了该系统的范围。
Sometimes an area of volatility may reside outside the system. For example, while payments may very well be a volatile area due to the various ways in which you could issue payments, TradeMe as a software project was not about implementing a payment system. The payments are ancillary to the core value of the system. The system will likely use a number of external payments systems as Resources. Resources may be whole systems, each with its own volatilities, but these are outside the scope of this system.
设计团队列出了以下足以影响架构的不稳定领域。该列表还标识了架构中与这些不稳定领域相对应的组件:
The design team produced the following list of areas volatile enough to affect the architecture. The list also identifies the corresponding components of the architecture that encapsulate the areas of volatility:
客户端应用程序。系统应允许多个不同的客户端环境以各自的速度独立发展。客户端迎合不同的用户(商人、承包商、市场代表或教育中心)或后台进程,例如定期与系统交互的计时器。这些客户端应用程序可能使用不同的 UI 技术、设备或 API(教育门户可能只是一个 API);它们可以在本地或通过互联网访问(商人与代表);它们可以连接或断开连接;等等。正如预期的那样,客户端与很多波动性相关。每个不稳定的客户端环境都封装在其自己的客户端应用程序中。
Client applications. The system should allow several distinct client environments to evolve separately at their own pace. The clients cater to different users (tradesmen, contractors, marketplace reps, or education centers) or to background processes, such as a timer that periodically interacts with the system. These client applications may use different UI technologies, devices, or APIs (perhaps the education portal is a mere API); they may be accessed locally or across the Internet (tradesmen versus reps); they may be connected or disconnected; and so on. As expected, the clients are associated with a lot of volatility. Each one of these volatile client environments is encapsulated in its own Client application.
管理会员。增加或删除工匠和承包商的活动存在波动,甚至他们获得的福利或折扣也存在波动。会员管理会随着地区和时间的变化而变化。这些波动都包含在 中Membership Manager。
Managing membership. There is volatility in the activities of adding or removing tradesmen and contractors, and even the benefits or discounts they get. Membership management changes across locales and over time. These volatilities are encapsulated in the Membership Manager.
费用。TradeMe的所有可能赚钱方式(结合交易量和价差)都包含在内Market Manager。
Fees. All the possible ways TradeMe can make money, combining volume and spread, are encapsulated in the Market Manager.
项目。项目需求和规模不仅会发生变化,而且不稳定,会影响所需的行为。小型项目可能需要与大型项目不同的工作流程。系统将项目封装在 中Market Manager。
Projects. Project requirements and size not only change but also are volatile and affect the required behavior. Small projects may require different workflows from large projects. The system encapsulates projects in the Market Manager.
争议。与人打交道时,最好的情况是产生误解;最坏的情况是发生彻头彻尾的欺诈。处理争议解决的波动性体现在以下几个方面Membership Manager。
Disputes. When dealing with people, at best misunderstandings will arise; at worst outright fraud happens. The volatility in handling dispute resolution is encapsulated by the Membership Manager.
匹配和批准。这里涉及两个波动性。如何找到符合项目需求的工匠的波动性包含在 中Search Engine。搜索标准及其定义的波动性包含在 中Market Manager。
Matching and approvals. Two volatilities come into play here. The volatility of how to find a tradesman that matches the project needs is encapsulated in the Search Engine. The volatility of search criteria and the definition thereof is encapsulated in the Market Manager.
教育。将培训课程与工匠匹配以及搜索可用课程或必修课程存在波动性。管理教育工作流程的波动性包含在 中Education Manager。搜索类和认证包含在 中Search Engine。 符合监管认证包含在 中Regulation Engine。
Education. There is volatility in matching a training class to a tradesman and in searching for an available class or a required class. Managing the education workflow volatility is encapsulated in the Education Manager. Searching for classes and certifications is encapsulated in the Search Engine. Compliance with regulatory certification is encapsulated in the Regulation Engine.
法规。任何国家的法规都可能随着时间的推移而发生变化。此外,法规可能是公司内部的。这种波动性体现在 中Regulation Engine。
Regulations. Regulations are likely to change in any given country as time goes by. In addition, the regulations can be internal to the company. This volatility is encapsulated in the Regulation Engine.
报告。系统需要遵守的所有报告和审计要求都包含在 中Regulation Engine。
Reports. All the requirements of reporting and auditing with which the system needs to comply are encapsulated in the Regulation Engine.
本地化。两种不同的波动性与本地化有关。客户端的 UI 元素囊括了语言和文化的波动性。对于 TradeMe,利益相关者认为这是一个足够好的解决方案。在其他情况下,本地化可能具有足够强的波动性,值得拥有自己的子系统(例如,管理器、资源)。本地化甚至可能影响资源的设计。各国法规的波动性体现在中Regulation Engine。
Localization. Two distinct volatilities relate to localization. UI elements of the Clients encapsulate the volatility in language and culture. For TradeMe, the stakeholders considered this a good enough solution. In other cases, localization could be a strong enough volatility that it would merit its own subsystem (e.g., Manager, Resources). Localization may even affect the design of the Resources. The volatility in regulations between countries is captured in the Regulation Engine.
资源。资源可能是外部系统(如支付)的门户,也可能是存储各种元素(如商人和项目列表)的门户。商店的确切性质是不稳定的,可能包括基于云的数据库、本地商店或整个其他系统。
Resources. The Resources may be portals to external systems (such as payment) or store various elements such as lists of tradesman and projects. The exact nature of the store is volatile, potentially ranging from cloud-based database to a local store to a whole other system.
资源访问。ResourceAccess组件封装了访问资源的易变性,例如存储位置、类型和访问技术。ResourceAccess组件将原子业务动词(例如“支付”)转换为访问相关资源(例如存储和支付系统)。
Resource access. ResourceAccess components encapsulate the volatility of accessing the Resources such as the location of the storage, its type, and access technology. The ResourceAccess components convert atomic business verbs such as “pay” (e.g., paying a tradesman) into accessing the relevant Resources such as storage and payment systems.
部署模型。部署模型具有易变性。有时数据无法离开某个地理区域,或者公司可能希望在云中部署部分或整个系统。这些易变性被封装在子系统和Message Bus实用程序的组成中。这种模块化可组合交互模式在系统操作概念中的优势将在后面介绍。
Deployment model. The deployment model is volatile. Sometimes data cannot leave a geographic area, or the company may wish to deploy parts or whole systems in the cloud. These volatilities are encapsulated in the composition of the subsystems and the Message Bus utility. The advantages of this modular composable interaction pattern in the system operational concepts are described later.
身份验证和授权。系统可以通过多种方式对客户端进行身份验证,无论他们是用户还是其他系统,并且有多种表示凭证和身份的选项。授权几乎是开放式的,有多种存储角色或表示声明的方式。这些易变性都封装在Security Utility组件中。
Authentication and authorization. The system can authenticate the Clients in a number of ways, whether they are users or even other systems, and there are multiple options for representing credentials and identities. Authorization is nearly open-ended, with many ways of storing roles or representing claims. These volatilities are encapsulated in the Security Utility component.
请注意,波动性区域与架构组件的映射并非 1:1。例如,前面的列表将三个波动性区域映射到Market Manager。回想一下第 3 章,管理器封装了一系列逻辑相关用例的波动性,而不仅仅是单个用例。在 的情况下Market Manager,这些市场用例是管理项目、将商人与项目匹配以及收取匹配费用。
Note that the mapping of areas of volatilities to components of the architecture is not 1:1. For example, the preceding list maps three areas of volatilities to the Market Manager. Recall from Chapter 3 that a Manager encapsulates the volatility of a family of logically related use cases, not just a single use case. In the case of the Market Manager, these market use cases are managing projects, matching tradesmen to projects, and charging the fees for the match.
该架构并未反映另外两个波动性较弱的领域:
Two additional, weaker areas of volatility are not reflected in the architecture:
通知。客户端如何与系统通信以及系统如何与外界通信可能会不稳定。Message Bus 实用程序的使用封装了这种不稳定性。如果公司对电子邮件或传真等开放式传输形式有强烈的需求,那么Notification Manager可能需要一个。
Notification. How the Clients communicate with the system and how the system communicates with the outside world could be volatile. The use of the Message Bus Utility encapsulates that volatility. If the company had a strong need for open-ended forms of transports such as email or fax, then perhaps a Notification Manager would have been necessary.
分析。TradeMe可以分析项目需求并验证所需工匠,甚至首先提出建议。通过这种方式,TradeMe 可以优化工匠的项目分配。系统可以用各种方式分析项目,而这种分析显然是一个不稳定的领域。然而,设计团队拒绝了分析,认为这是设计中的一个不稳定领域,因为如上所述,该公司不从事优化项目的业务。因此,提供优化属于推测性设计。任何所需的分析活动都包含在内Market Manager。
Analysis. TradeMe could analyze the requirements of projects and verify the requested tradesmen or even propose them in the first place. In this way, TradeMe could optimize the tradesmen assignment to projects. The system could analyze projects in various ways, with such analysis clearly being a volatile area. However, the design team rejected analysis as an area of volatility in the design because, as stated, the company is not in the business of optimizing projects. Providing optimizations, therefore, falls into speculative design. Any analysis activity required is folded into the Market Manager.
图 5-14显示了该架构的静态视图。
Figure 5-14 shows the static view of the architecture.
图 5-14 TradeMe 架构的静态视图
Figure 5-14 Static view of the TradeMe architecture
客户端层包含针对每种类型的成员(商人和承包商)的门户。此外,还有一个用于教育中心颁发或验证商人凭证的门户,以及一个用于后端用户管理市场的市场应用程序。此外,客户端层还包含外部进程,例如定期启动系统某些行为的调度程序或计时器。这些都包含在架构中以供参考,但不是系统的一部分。
The client tier contains a portal for each type of member, the tradesmen and the contractors. There is also a portal for the education center to issue or validate tradesman credentials and a marketplace application for the back-end users to administer the marketplace. In addition, the client tier contains external processes such as a scheduler or a timer that periodically initiates some behavior with the system. These are included in the architecture for reference, but are not part of the system.
业务逻辑层包括Membership Manager和Market Manager,封装了前面讨论过的有关波动性。简而言之,Membership Manager管理会员用例执行中的波动性,而Market Manager负责与市场相关的用例。请注意,会员用例(例如添加或删除商人)在逻辑上彼此相关,并且与与市场相关的用例(例如将商人与项目匹配)不同。Education Manager封装了与继续教育相关的用例执行中的波动性,例如协调培训和审查教育证书。
In the business logic tier are the Membership Manager and Market Manager, encapsulating the respecting volatilities discussed previously. In short, the Membership Manager manages the volatility in the execution of the membership use cases, while the Market Manager is in charge of the use cases pertaining to the marketplace. Note that the use cases of membership (such as adding or removing a tradesman) are both logically related to each other and distinct from those related to the marketplace such as matching a tradesman to a project. The Education Manager encapsulates the volatility in the execution of use cases related to continuing education such as coordinating training and reviewing the education certificates.
只有两个引擎,它们囊括了前面列出的一些剧烈波动。Regulation Engine囊括了不同国家之间,甚至同一国家之间随时间推移的监管和合规波动。Search Engine囊括了生成匹配的波动,这可以通过多种方式完成,从简单的费率查询,到安全和质量记录考虑,再到用于分配的 AI 和机器学习技术。
There are only two Engines, which encapsulate some of the acute volatilities listed previously. The Regulation Engine encapsulates the regulation and compliance volatility between different countries and even in the same country over time. The Search Engine encapsulates the volatility in producing a match, something that can be done in an open-ended number of ways, ranging from a simple rate lookup, to safety and quality record considerations, to AI and machine learning techniques for the assignments.
管理市场所需的实体(例如付款、会员和项目)都有一些存储和相应的ResourceAccess组件。还有工作流存储,稍后会讨论。
The entities required when managing a marketplace, such as payments, members, and projects, all have some storage and corresponding ResourceAccess components. There is also workflows storage, as discussed later.
系统需要三个实用程序:Security、Message Bus和Logging。任何未来的实用程序(例如仪器)也将放在实用程序栏中。
The system requires three Utilities: Security, Message Bus, and Logging. Any future Utilities (e.g., instrumentation) would go in the Utility bar as well.
通过 TradeMe,所有客户和经理之间的所有通信都通过Message Bus 实用程序进行。图 5-16说明了这一操作概念。
With TradeMe, all communication between all Clients and all Managers takes place over the Message Bus Utility. Figure 5-16 illustrates this operational concept.
图 5-16抽象系统交互模式
Figure 5-16 The abstract system interaction pattern
在这种交互模式中,客户端和子系统中的业务逻辑通过 相互解耦Message Bus。使用Message Bus通常支持以下操作概念:
In this interaction pattern, the Clients and the business logic in the subsystems are decoupled from each other by the Message Bus. Use of the Message Bus in general supports the following operational concepts:
所有通信都利用一种通用媒介(Message Bus)。它囊括了消息的性质、各方的位置以及通信协议。
All communication utilizes a common medium (the Message Bus). This encapsulates the nature of the messages, the location of the parties, and the communication protocol.
用例发起者(例如客户端)和用例执行者(例如管理器)从来不直接交互。如果他们彼此不了解,他们可以单独发展,从而促进可扩展性。
No use case initiator (such as Clients) and use case executioner (such as Managers) ever interact directly. If they are unaware of each other, they can evolve separately, which fosters extensibility.
多个并发客户端可以在同一用例中交互,每个客户端执行用例的一部分。客户端和系统之间没有锁步执行。这反过来又导致了时间线分离和时间线上组件的解耦。
A multiplicity of concurrent Clients can interact in the same use case, with each performing its part of the use case. There is no lock-step execution across Clients and system. This, in turn, leads to timeline separation and decoupling of the components along the timeline.
高吞吐量是可能的,因为下面的队列Message Bus每秒可以接受大量消息。
High throughput is possible because the queues underneath the Message Bus can accept a very large number of messages per second.
消息总线支持的操作概念当然很好,但本身可能不足以证明复杂性的增加。选择消息总线的主要原因是它支持 TradeMe 最重要的操作概念:消息即应用程序设计模式。
The operational concepts that a message bus supports are certainly nice to have, but by themselves may not justify the increased complexity. The main reason for choosing a message bus is because it supports the most important operational concept of TradeMe: the Message Is the Application design pattern.
当使用这种设计模式时,“应用程序”无处可寻。没有可以指向并标识为应用程序的组件或服务集合。相反,系统由一组松散的服务组成,这些服务彼此发送和接收消息(通过消息总线,尽管这是次要考虑)。这些消息彼此相关。处理消息的每个服务都会执行一些工作单元,然后将消息发回总线。其他服务随后将检查该消息,其中一些(或其中一个,或没有一个)将决定执行某些操作。实际上,一个服务发布的消息会触发另一个服务执行发布服务不知道的操作。这几乎将解耦发挥到了极限。
When using this design pattern, the “application” is nowhere to be found. There is no collection of components or services that you can point to and identify as the application. Instead, the system comprises a loose collection of services that post and receive messages to one another (over a message bus, although that is secondary consideration). These messages are related to each other. Each service processing a message does some unit of work, and then posts a message back to the bus. Other services will subsequently examine the message, and some of them (or one of them, or none of them) will decide to do something. In effect, the message post by one service triggers another service to do something unbeknownst to the posting service. This stretches decoupling almost to the limit.
通常,相同的逻辑消息可能会遍历所有服务。服务可能会向消息添加其他上下文信息(例如在标题中)、修改以前的上下文、将上下文从旧消息传递到新消息等等。这样,服务就充当了消息的转换函数。消息即应用程序模式最重要的方面是,应用程序所需的行为是这些转换的集合,加上单个服务完成的本地工作。任何所需的行为变化都会导致服务响应消息的方式发生变化,而不是架构或服务的变化。
Often the same logical message may traverse all the services. Likely the services will add additional contextual information to the message (such as in the headers), modify previous context, pass context from the old message to a new one, and so on. In this way, the services act as transformation functions on the messages. The paramount aspect of the Message Is the Application pattern is that the required behavior of the application is the aggregate of those transformations plus the local work done by the individual services. Any required behavior changes induce changes in the way your services respond to the messages, rather than the architecture or the services.
TradeMe 的业务目标证明了使用这种模式的合理性,因为它具有所需的可扩展性。公司可以通过添加消息处理服务来扩展系统,从而避免修改现有服务并避免工作实施的风险。这正确地支持了第 3 章中的指令,即您应该始终以增量方式而不是迭代方式构建系统。前瞻性设计的目标在这里也得到了很好的实现,因为这种模式中没有任何内容将系统与当前需求联系起来。这种模式也是集成外部系统的一种优雅方式——这是另一个业务目标。
The business objectives for TradeMe justified the use of this pattern because of the required extensibility. The company can extend the system by adding message processing services, thereby avoiding modification of existing services and risk to a working implementation. This correctly supports the directive from Chapter 3 that you should always build systems incrementally, not iteratively. The objective of forward-looking design is also well served here because nothing in this pattern ties the system to the present requirements. This pattern is also an elegant way of integrating external systems—yet another business objective.
就像生活中的一切一样,实施此模式需要付出代价。并非每个组织都能证明使用该模式或甚至拥有消息总线是合理的。成本几乎总是以额外的系统复杂性和移动部件、需要学习的新 API、部署和安全问题、复杂的故障场景等形式出现。好处是本质上解耦的系统,面向需求变更、可扩展性和重用。一般来说,当您可以投资平台并获得组织自上而下和自下而上的支持时,您应该使用此模式。在许多情况下,客户端只需排队呼叫管理器的更简单的设计更适合开发团队。始终根据开发人员和管理人员的能力和成熟度校准架构。毕竟,改变架构比改变组织要容易得多。一旦组织能力成熟,您就可以纳入完整的“消息即应用程序”模式。
As with everything in life, implementing this pattern comes with a cost. Not every organization can justify using the pattern or even having a message bus. The cost will almost always take the form of additional system complexity and moving parts, new APIs to learn, deployment and security issues, intricate failure scenarios, and more. The upside is an inherently decoupled system geared toward requirements churn, extensibility, and reuse. In general, you should use this pattern when you can invest in a platform and have the backing of your organization both top-down and bottom-up. In many cases, a simpler design in which the Clients just queue up calls to the Managers would be a better fit for the development team. Always calibrate the architecture to the capability and maturity of the developers and management. After all, it is a lot easier to morph the architecture than it is to bend the organization. Once the organizational capabilities have matured, you can incorporate a full Message Is the Application pattern.
使用方法,管理器可以封装业务工作流中的波动性。没有什么可以阻止您在管理器中简单地编写工作流,然后当工作流发生变化时,更改管理器中的代码。这种方法的问题在于,工作流中的波动性可能超出开发人员仅使用代码来跟上的能力(以时间和精力来衡量)。
With The Method, the Managers encapsulate the volatility in the business workflows. Nothing prevents you from simply coding the workflows in the Mangers and then, when the workflows change, changing the code in the Managers. The problem with this approach is that the volatility in the workflows may exceed the developers’ ability, as measured in time and effort, to catch up using just code.
TradeMe 中的下一个操作概念是使用工作流管理器。我在第 2 章讨论股票交易系统时暗示了这个概念,但本章将其编纂为另一种操作模式。TradeMe中的所有管理器都是工作流管理器。工作流管理器是一种服务,可让您创建、存储、检索和执行工作流。理论上,它只是另一个管理器。然而在实践中,这样的管理器几乎总是使用某种第三方工作流执行工具和工作流存储。对于每个客户端调用,工作流管理器不仅加载正确的工作流类型,还会加载具有特定状态和上下文的特定实例;执行工作流;并将其持久化回工作流存储。每次加载和保存工作流实例都支持长时间运行的工作流。管理器也不必在保持状态感知的同时与客户端保持任何类型的会话。同一工作流执行中同一用户的每次调用可以来自不同连接上的不同设备,并带有管理器应加载和执行的工作流实例的唯一 ID,以及有关客户端的信息,如其地址(例如 URI)。
The next operational concept in TradeMe is the use of workflow Managers. I hinted at this concept in Chapter 2 in the discussion of the stock trading system, but this chapter codifies it as another operational pattern. All Managers in TradeMe are workflow Managers. A workflow Manager is a service that enables you to create, store, retrieve, and execute workflows. In theory, it is just another Manager. In practice, however, such Managers almost always utilize some sort of third-party workflow execution tool and workflow storage. For each Client call, the workflow Manager loads not just the correct workflow type but also a specific instance of it, with a particular state and context; executes the workflow; and persists it back to the workflow store. Loading and saving the workflow instance each time supports long-running workflows. The Manager also does not have to maintain any kind of a session with the Client while remaining state-aware. Each call from the same user in the same workflow execution can come from a different device on a different connection and carries with it the unique ID of the instance of the workflow that the Manager should load and execute, as well as information about the client such as its address (e.g., URI).
要添加或更改功能,只需添加或更改相关经理的工作流程,而不一定需要个人的实施参与服务。这是将功能作为集成方面提供的一种清晰方式(如第 4 章所述),也是系统使命陈述的一个有形方面,可让您说明架构如何支持业务。
To add or change a feature, you simply add or change the workflows of the Managers involved, but not necessarily the implementation of the individual participating services. This is a clean way to provide features as aspects of integration (as discussed in Chapter 4) and is a tangible aspect of the mission statement for the system, allowing you to illustrate how the architecture supports the business.
当系统必须处理高波动性时,使用工作流管理器的真正必要性就出现了。使用工作流管理器,您只需编辑所需的行为并部署新生成的代码。此编辑的性质特定于您选择的工作流工具。例如,某些工具使用脚本编辑器,而其他工具使用看起来像活动图的可视化工作流并生成甚至部署工作流代码。
The real necessity for using a workflow Manager arises when the system must handle high volatility. With a workflow Manager, you merely edit the required behavior and deploy the newly generated code. The nature of this editing is specific to the workflow tool you choose. For example, some tools use script editors, whereas others use visual workflows that look like activity diagrams and generate or even deploy the workflow code.
您甚至可以(通过适当的保护措施)让产品所有者或最终用户编辑所需的行为。这大大缩短了交付功能的周期,并允许软件开发团队专注于核心服务,而不是追逐需求的变化。
You can even (with the right safeguards) have the product owners or the end users edit the required behavior. This drastically reduces the cycle time for delivering features and allows the software development team to focus on the core services as opposed to chasing changes in the requirements.
TradeMe 的业务需求证明了使用这种模式的合理性,因为使用由小型、分散的团队手工编写的编码不可能实现快速交付功能的目标。使用工作流管理器可以实现跨市场的高度定制,从而满足系统的另一个目标。
The business needs for TradeMe justified the use of this pattern because the objective of a quick turnaround for features is impossible to meet using hand-crafted coding by a small, thinly spread team. Use of workflow Manager enables a high degree of customization across markets, satisfying another objective for the system.
再次,请仔细评估此概念是否适用于您的特定情况。确保工作流程的波动程度能够证明额外的复杂性、学习曲线和开发流程的变化是合理的。
Again, evaluate carefully whether this concept is applicable to your particular case. Make sure the level of workflow volatility justifies the additional complexity, learning curves, and changes to the development process.
在开始工作之前,您必须知道设计是否能够支持所需的行为。如第 4 章所述,要验证您的设计,您需要通过集成服务中封装的各种易变性领域来证明设计可以支持核心用例。您可以通过显示每个用例的相应调用链或序列图来验证设计。您可能需要多个图表来完成一个用例。
You must know before work commences whether the design can support the required behaviors. As Chapter 4 explains, to validate your design, you need to show that the design can support the core use cases by integrating the various areas of volatility encapsulated in your services. You validate the design by showing the respective call chain or sequence diagram for each use case. You may require more than one diagram to complete a use case.
重要的是要证明你的设计不仅对自己有效,对其他人也有效。如果你无法验证你的架构,或者验证太模糊,你需要重新开始。
It is important to demonstrate that your design is valid not just to yourself, but also to others. If you cannot validate your architecture, or if the validation is too ambiguous, you need to go back to the drawing board.
如前所述,TradeMe 的少数公司提供的用例仅包含一个核心用例候选:Match Tradesmen。TradeMe 的架构是模块化的,与所有用例分离到一定程度,以至于设计团队可以证明它支持所有提供的用例,而不仅仅是核心 Match Tradesman 用例。下一节将说明 TradeMe 用例的验证和新系统的操作概念。
As mentioned previously, the few company-provided use cases for TradeMe included just a single candidate for a core use case: Match Tradesmen. The architecture of TradeMe was modular and decoupled from all the use cases to such an extent that the design team could demonstrate that it supported all the provided use cases, not just the core Match Tradesman use case. The next section illustrates the validation of the TradeMe use cases and the operational concept of the new system.
添加工匠/承包商用例涉及几个易变的领域:工匠(或承包商)客户端应用程序、添加成员的工作流程、法规遵从性以及使用的支付系统。您可以通过向图中添加泳道来重新排列和简化图 5-1中的用例,如图 5-17所示。
The Add Tradesman/Contractor use case involves several areas of volatility: the tradesman (or contractor) Client applications, the workflow of adding a member, compliance with regulations, and the payment system used. You can rearrange and simplify the use case from Figure 5-1 by adding swim lanes to the diagram, as shown in Figure 5-17.
图 5-17带有泳道的添加工匠/承包商用例
Figure 5-17 The Add Tradesman/Contractor use case with swim lanes
图 5-17显示,用例的执行需要客户端应用程序和会员子系统之间的交互。这在图 5-18的实际调用链中很明显(添加承包商用例与承包商的应用程序相同,但Contractors Portal)。按照 TradeMe 的操作概念,在图 5-18中,客户端应用程序(在本例中,当Tradesman Portal会员直接申请时或当Marketplace App后端代表添加会员时)将请求发布到Message Bus。
Figure 5-17 shows that the execution of the use case requires interaction between a Client application and the membership subsystem. This is evident in the actual call chains of Figure 5-18 (the Adding Contractor use case is identical but with the contractor’s application, the Contractors Portal). Following the operational concepts of TradeMe, in Figure 5-18 the Client application (in this case, either the Tradesman Portal when the member is applying directly or the Marketplace App when the back-end rep is adding the member) posts the request to the Message Bus.
图 5-18添加工匠/承包商调用链
Figure 5-18 The Add Tradesman/Contractor call chain
收到消息后,Membership Manager(即工作流管理器)将从工作流存储中加载适当的工作流。这将启动新的工作流或重新补充现有工作流以继续执行工作流执行。一旦工作流执行完请求,Membership Manager就会将消息发回 ,Message Bus指示工作流的新状态,例如已完成,或者可能指示其他某个管理器现在可以开始处理工作流,因为工作流处于新状态。客户端也可以监视Message Bus,并向用户更新他们的请求。Membership Manager咨询 ,Regulation Engine该管理器正在验证商人或承包商,将商人或承包商添加到商店,并通过 向客户端Members更新。Message Bus
Upon receiving the message, the Membership Manager (which is a workflow Manager) loads the appropriate workflow from the workflow storage. This either kicks off a new workflow or rehydrates an existing one to carry on with the workflow execution. Once the workflow has finished executing the request, the Membership Manager posts a message back into the Message Bus indicating the new state of the workflow, such as its completion, or perhaps indicating that some other Manager can start its processing now that the workflow is in a new state. Clients can monitor the Message Bus as well and update the users about their requests. The Membership Manager consults the Regulation Engine that is verifying the tradesman or contractor, adds the tradesman or contractor to the Members store, and updates the Clients via the Message Bus.
请求商人用例包括两个关注领域:承包商和市场(图 5-19)。在初步验证请求后,此用例触发另一个用例,即匹配商人。
The Request Tradesman use case includes two areas of interest: the contractor and the market (Figure 5-19). After initial verification of the request, this use case triggers another use case, Match Tradesman.
图 5-19带有泳道的“请求商人”用例
Figure 5-19 The Request Tradesman use case with swim lanes
调用链如图 5-20所示。客户端(例如Contractors Portal或 的内部用户)Marketplace App向总线发送一条消息,请求一名技工。Market Manager接收该消息。Market Manager加载与此请求相对应的工作流,并执行操作,例如咨询 了解Regulation Engine此请求可能有效的内容或使用技工请求更新项目。然后 可以将某人Market Manager发回给Message Bus请求工匠。这将触发匹配和分配工作流程,所有工作流程都在时间线上分开。
The call chains are depicted in Figure 5-20. Clients such as the Contractors Portal or the internal user of the Marketplace App post a message to the bus requesting a tradesman. The Market Manager receives that message. The Market Manager loads the workflow corresponding to this request, and performs actions such as consulting with the Regulation Engine about what may be valid for this request or updating the project with the request for a tradesman. The Market Manager can then post back to the Message Bus that someone is requesting a tradesman. This will trigger the matching and assignment workflows, all separated on the timeline.
图 5-20请求 Tradesman 调用链(直到匹配)
Figure 5-20 Request Tradesman call chains (until matching)
Match Tradesman 核心用例涉及多个关注领域。首先是谁发起了触发匹配用例的 Tradesman 请求。发起者可以是客户(承包商或市场代表),如图5-20所示,但也可以是计时器或启动匹配工作流的任何其他子系统。其他关注领域包括市场、法规、搜索以及最终的会员资格,如图 5-21所示。
The Match Tradesman core use case involves multiple areas of interest. The first is who initiated the tradesman request that has triggered the match use case. That initiator could be a Client (a contractor or the marketplace reps), as in Figure 5-20, but it could also be a timer or any other subsystem that kicks off the match workflow. The other areas of interest are the market, regulations, search, and ultimately membership, as shown in Figure 5-21.
图 5-21带有泳道的 Match Tradesman 用例
Figure 5-21 The Match Tradesman use case with swim lanes
一旦你意识到法规和搜索都是市场要素,你就可以重构活动图,如图 5-22所示。这样可以轻松映射到你的子系统设计中。
Once you realize that regulations and search are all elements of the market, you can refactor the activity diagram to that shown in Figure 5-22. This enables easy mapping to your subsystems design.
图 5-22 Match Tradesman 用例的重构泳道
Figure 5-22 Refactored swim lanes for the Match Tradesman use case
图 5-23描述了相应的调用链。同样,此调用链与其他调用链是对称的,因为第一个操作是加载适当的工作流并执行它。调用链对Message Bus和的最后一次调用Membership Manager触发了 Assign Tradesman 用例。
Figure 5-23 depicts the corresponding call chain. Again, this call chain is symmetrical with other call chains, in the sense that the first action is to load the appropriate workflow and execute it. The last call of the call chain to the Message Bus and to the Membership Manager triggers the Assign Tradesman use case.
图 5-23 Match Tradesman 用例的调用链
Figure 5-23 Call chains for the Match Tradesman use case
请注意此设计的可组合性。例如,假设公司确实需要在分析项目需求时处理剧烈波动。查找匹配的调用链允许将搜索与分析分开。您可以添加一个Analysis Engine来封装单独的分析算法集。企业甚至可以利用 TradeMe 来获取一些商业智能,以回答诸如“我们能做得更好吗?”之类的问题。例如,类似于图 5-23的调用链可用于更复杂的场景“分析 2016 年至 2019 年之间的所有项目”,并且组件的设计根本不需要改变。这些用例的数量可能是开放的,这就是重点:您有一个开放式的设计,可以扩展以实现任何这些未来场景,即真正的可组合设计。
Notice the composability of this design. For example, suppose the company really does need to handle acute volatility in analyzing the project’s needs. The call chain for finding a match allows for separating search from analysis. You would add an Analysis Engine to encapsulate the separate set of analysis algorithms. The business can even leverage TradeMe for some business intelligence to answer questions like “Could we have done things better?” For example, a call chain similar to Figure 5-23 could be used for the much more involved scenario of “Analyze all projects between 2016 and 2019” and the design of the components would not have to change at all. The number of these use cases is likely open, and that is the whole point: You have an open-ended design that can be extended to implement any of these future scenarios, a true composable design.
分配商人用例涉及四个关注领域(图 5-24):客户、会员、法规和市场。请注意,该用例与触发者无关,无论是实际的内部用户还是来自另一个子系统的总线请求消息。例如,在自动匹配和分配的情况下,匹配商人用例可以触发分配用例作为工作流的直接延续。
The Assign Tradesman use case involves four areas of interest (Figure 5-24): client, membership, regulations, and market. Note that the use case is independent of who triggered it, whether an actual internal user or just a request message off the bus from another subsystem. For example, the Match Tradesman use case could trigger the assignment use case as a direct continuation of the workflow in the case of automatic match and assignment.
图 5-24分配商人用例泳道
Figure 5-24 The Assign Tradesman use case swim lanes
再次,重构活动图之后,很容易映射到子系统(图5-25)。
Again, after refactoring the activity diagram, it is easy to map to subsystems (Figure 5-25).
图 5-25分配商人用例的统一泳道
Figure 5-25 Unified swim lanes of the Assign Tradesman use case
与所有先前的调用链一样,图 5-26显示了Membership Manager执行最终将工匠分配到项目的工作流。这是 和 之间的协作工作Membership Manager,Market Manager它们各自管理各自的子系统。请注意, 并不Membership Manager知道 ,Market Manager它只是向总线发送一条消息。Market Manager接收该消息并根据其内部工作流更新项目。Market Manager反过来, 可能会向 发送另一条消息以Message Bus触发另一个用例,例如发布项目报告或为承包商开具账单,或几乎任何事情。这就是“消息即应用程序”设计模式的全部内容:逻辑“分配”消息在服务之间穿梭,在传递过程中触发本地行为。客户端还可以监视Message Bus,并可能通知用户分配正在进行中。
As with all previous call chains, Figure 5-26 shows how the Membership Manager is executing the workflow that ultimately leads to assigning the tradesman to the project. This is a collaborative work between the Membership Manager and the Market Manager, with each managing its respective subsystem. Note that the Membership Manager is unaware of the Market Manager” it just posts a message to the bus. The Market Manager receives that message and updates the project according to its internal workflow. The Market Manager may, in turn, post another message to the Message Bus to trigger another use case, such as issuing a report on the project, or billing for the contractor, or pretty much anything. This is what the Message Is the Application design pattern is all about: The logical “assignment” message weaves its way between the services, triggering local behaviors as it goes. The Client can also monitor the Message Bus and may advise the user that the assignment is in progress.
图 5-26 Assign Tradesman 用例的调用链
Figure 5-26 Call chains for the Assign Tradesman use case
在之前的用例中,初始图泳道包括法规区域,该区域随后被合并到会员子系统中。由于这是一种重复出现的模式,图 5-9显示了 Terminate Tradesman 用例的重构图。该图仍然提供了足够的区分,以便清晰地映射到设计中。
In the previous use cases, the initial diagram swim lanes included the regulations area, which was subsequently consolidated into the membership subsystem. Since this was such a recurring pattern, Figure 5-9 shows the refactored diagram for the Terminate Tradesman use case. This diagram still provides enough differentiation to allow for clear mapping to the design.
图 5-27显示了终止商人的调用链。Market Manager启动终止工作流程并通知Membership Manager终止事宜。
Figure 5-27 shows the call chain for terminating a tradesman. The Market Manager initiates the termination workflow and notifies the Membership Manager of the termination.
图 5-27 Terminate Tradesman 用例的调用链
Figure 5-27 Call chains for the Terminate Tradesman use case
任何错误情况或偏离“正确路径”都会添加一条虚线灰色箭头,从后端Membership Manager指向Message Bus最终返回到客户端。图 5-28是演示此交互的序列图,其中没有ResourceAccess服务和Resources之间的调用。
Any error condition or deviation from the “happy path” would add a dashed gray arrow from the Membership Manager back to the Message Bus and ultimately back to the client. Figure 5-28 is a sequence diagram demonstrating this interaction, without the calls between the ResourceAccess services and the Resources.
图 5-28 Terminate Tradesman 用例的序列图
Figure 5-28 Sequence diagram for the Terminate Tradesman use case
最后,图 5-27中的调用链图(或图 5-28中的序列图)假设终止用例在项目完成时触发,承包商终止指派的商人。但它也可以由商人从 向 发布消息来触发Tradesman Portal,Membership Manager这将导致调用链以相反的方向流动(Membership Manager流向Market Manager并继续流向客户端应用程序)。这再次证明了设计的多功能性。
Finally, the call chain diagram in Figure 5-27 (or the sequence diagram of Figure 5-28) assumes the termination use case is triggered when a project is completed, and the contractor terminates the assigned tradesmen. But it can also be triggered by the tradesman posting a message from the Tradesman Portal to the Membership Manager, which would cause the call chain to flow in the opposite direction (Membership Manager to Market Manager and on to the Client apps). Again, this is a testimony to the versatility of the design.
其余用例与迄今为止描述的用例的交互和设计模式密切相关,因此这里只对它们进行简要描述。还请注意调用链中的高度自相似性或对称性。图 5-6显示了 Pay Tradesman 用例,其验证调用链如图 5-29所示。
The rest of the use cases closely follow the interactions and design pattern of the use cases described thus far, so only brief descriptions of them appear here. Also note the high degree of self-similarity or symmetry in the call chains. Figure 5-6 showed the Pay Tradesman use case, and its validating call chain is in Figure 5-29.
图 5-29 Pay Tradesman 用例的调用链
Figure 5-29 Call chains for the Pay Tradesman use case
与之前的调用链不同,付款由客户已启用的调度程序或计时器触发。调度程序与实际组件分离,不了解系统内部情况:它所做的只是向总线发送消息。实际付款是在PaymentAccess更新Payments商店并访问外部支付系统(TradeMe 的资源)时进行的。
Unlike the previous call chains, the payment is triggered by a scheduler or timer that the customer has already in service. The scheduler is decoupled from the actual components and has no knowledge of the system internals: All it does is post a message to the bus. The actual payment is made by PaymentAccess when updating the Payments store and accessing an external payment system, a Resource to TradeMe.
在另一个简单的用例中,Market Manager通过执行相应的工作流来响应创建项目的请求(参见图 5-30和图 5-7中的用例图)。无论这需要多少步骤,或涉及多少错误,工作流管理器模式的性质都允许根据需要进行尽可能多的排列。
In another simple use case, the Market Manager responds to the request to create a project by executing a corresponding workflow (see Figure 5-30 and the use case diagram in Figure 5-7). Regardless of how many steps this takes, or how many errors are involved, the nature of the workflow Manager pattern allows for as many permutations as needed.
图 5-30创建项目用例的调用链
Figure 5-30 Call chains for the Create Project use case
关闭项目用例涉及和Market Manager(Membership Manager参见图 5-31和图 5-8中的用例)。TradeMe 再次通过这两个主要抽象之间的相互作用完成了这项任务;交互与图 5-27中显示的交互相同。
The Close Project use case involves both the Market Manager and the Membership Manager (see Figure 5-31 and the use case in Figure 5-8). Again, TradeMe accomplishes this task with the interplay between these two major abstractions; the interaction is identical to that shown in Figure 5-27.
图 5-31关闭项目用例的调用链
Figure 5-31 Call chains for the Close Project use case
本书的第一部分以冗长的系统设计案例研究作为结束。掌握系统设计只是成功的第一步。接下来是项目设计。你应该趁热打铁:始终将系统设计与项目设计结合起来,最好是连续进行,作为一项持续的设计工作。
This lengthy system design case study concludes the first part of this book. Having the system design in hand is just the first ingredient of success. Next comes project design. You should strike while the iron is hot: Always follow system design with project design, ideally back-to-back, as a continuous design effort.
就像设计软件系统一样,您必须设计项目才能构建系统。这包括准确计算计划的持续时间和成本、设计几个好的执行选项、调度资源,甚至验证您的计划以确保其合理可行。项目设计需要了解服务和活动之间的依赖关系、集成的关键路径、人员分布以及所涉及的风险。所有这些挑战都源于您的系统设计,妥善解决它们是一项工程任务。因此,作为负责的工程师,您(软件架构师)负责设计项目。
Much as you design the software system, you must design the project to build the system. This includes accurately calculating the planned duration and cost, devising several good execution options, scheduling resources, and even validating your plan to ensure it is sensible and feasible. Project design requires understanding the dependencies between services and activities, the critical path of integration, the staff distribution, and the risks involved. All of these challenges stem from your system design, and addressing them properly is an engineering task. As such, it is up to you, the software architect, as the engineer in charge, to design the project.
您应该将项目设计视为系统设计工作的延续。将系统设计和项目设计结合起来会产生非线性效应,从而大大提高项目成功的可能性。同样重要的是要注意,项目设计不是项目管理的一部分。相反,项目设计之于项目管理就像架构之于编程。
You should think of project design as a continuation of the system design effort. Combining system design and project design yields a nonlinear effect that drastically improves the likelihood of success for the project. It is also important to note that project design is not part of project management. Instead, project design is to project management what architecture is to programming.
本书的第二部分是关于项目设计的。接下来的章节介绍了传统思想以及我原创的、经过实践检验的技术和方法,涵盖了现代软件项目设计的核心知识体系。本章提供了项目设计的背景和基本动机。
The second part of this book is all about project design. The following chapters present conventional ideas along with my original, battle-proven techniques and methodologies, covering the core body of knowledge of modern software project design. This chapter provides the background and essential motivation for project design.
没有哪个项目有无限的时间、金钱或资源。所有合理的项目计划总是会用金钱换取时间,反之亦然。此外,对于任何给定项目,都有无数种可能的时间和成本组合。如果您有一名开发人员或四名开发人员,如果您有两年或六个月的时间,或者如果您试图将风险降至最低并最大限度地提高成功的可能性,那么您将拥有不同的项目。
No project has infinite time, money, or resources. All sound project plans always trade some time for money, or vice versa. Furthermore, for any given project, there are numerous possible combinations of schedule and cost. If you have one developer or four developers, if you have two years or six months, or if you try to minimize risks and maximize the probability for success, you will have different projects.
在设计项目时,您必须为管理层提供几个可行的交易计划、成本和风险选项,让管理层和其他决策者能够提前选择最适合他们需求和期望的解决方案。在项目设计中提供选项是成功的关键。寻找平衡的解决方案甚至最佳解决方案是一项高度工程化的设计任务。我说它是“工程化的”,不仅仅是因为其中涉及设计和计算,还因为工程就是权衡和适应现实。
When you design a project you must provide management with several viable options trading schedule, cost, and risk, allowing management and other decision makers to choose up front the solution that best fits their needs and expectations. Providing options in project design is the key to success. Finding a balanced solution and even an optimal solution is a highly engineered design task. I say it is “engineered” not just because of the design and calculations involved, but because engineering is all about tradeoffs and accommodating reality.
增加项目设计挑战的现实是,即使对于同一组约束,也没有单一的正确解决方案,就像任何系统总是有几种可能的设计方法一样。为满足紧迫的时间表而设计的项目将比为降低成本和最小化风险而设计的项目花费更多,风险更大,也更复杂。没有“项目”;只有选择。您的任务是将这个几乎无数的可能性范围缩小到几个好的项目设计选项,例如:
Adding to the challenge of project design is the reality that no single correct solution exists even for the same set of constraints, much as there are always several possible design approaches for any system. Projects designed to meet an aggressive schedule will cost more and be far more risky and complex than projects designed to reduce cost and minimize risk. There is no “THE Project”; there are only options. Your task is to narrow this spectrum of near-countless possibilities to several good project design options, such as the following:
构建系统最便宜的方法
The least expensive way to build the system
交付系统的最快方式
The fastest way to deliver the system
履行承诺的最安全方式
The safest way of meeting your commitments
进度、成本和风险的最佳组合
The best combination of schedule, cost, and risk
以下章节将向您展示如何识别良好的项目设计方案。如果您不提供这些方案,您与管理层的冲突只能怪自己。您经常会花功夫设计系统,然后将其提交给管理层,结果却被经理们下达命令:“您有一年的时间和四个开发人员。”那一年、四个开发人员和交付系统的实际需要之间的任何关联都是偶然的——您成功的机会也是如此。但是,如果您提出相同的架构,并附上三到四个项目设计方案,所有这些方案都是可行的,但反映了进度、成本和风险的不同权衡,会议将呈现完全不同的动态。现在讨论将围绕选择这些选项中的哪一个展开。
The following chapters will show you how to identify good project design options. If you do not provide these options, you will have no one to blame but yourself for conflicts with management. How often do you labor on the design of the system and then present it to management, only to have managers ordain, “You have a year and four developers.” Any correlation between that year, the four developers, and what it really takes to deliver the system is accidental—and so are your chances of success. However, if you present the same architecture, accompanied by three to four project design options, all of them doable, but reflecting different tradeoffs of schedule, cost, and risk, a completely different dynamic will rule the meeting. Now the discussion will revolve around which of these options to choose.
你必须提供一个让管理者能够做出正确决策的环境。关键是只为他们提供好的选择。无论他们选择哪个选项,都将是一个好的决定。
You must provide an environment in which managers can make good decisions. The key is to provide them with only good options. Whichever option they do choose will then be a good decision.
项目设计可以让你发现黑暗的角落——也就是说,提前了解项目的真实范围。项目设计迫使管理者在工作开始前仔细考虑,认识到意想不到的关系和限制,代表所有活动,并识别构建系统的几种选择。它允许组织确定是否要完成该项目。毕竟,如果实际成本和持续时间将超过可接受的限度,那么为什么一开始就开始工作,而一旦资金或时间用完,项目就会被取消?
Project design allows you to shed light on dark corners—that is, to have up-front visibility on the true scope of the project. Project design forces managers to think through work before it begins, to recognize unsuspected relationships and limits, to represent all activities, and to recognize several options for building the system. It allows the organization to determine whether it even wants to get the project done. After all, if the true cost and duration will exceed the acceptable limits, why start the work in the first place, only to have the project canceled once you run out of money or time?
一旦项目设计到位,您便可以避免常见的成本赌博、开发死亡行军、对项目成功的痴心妄想以及代价高昂的反复试验。在工作开始后,精心设计的项目还为决策者评估和思考拟议变更对进度和预算的影响奠定了基础。
Once project design is in place, you eliminate the commonplace gambling with costs, development death marches, wishful thinking about project success, and horrendously expensive trials and errors. After work commences, a well-designed project also lays the foundation for decision makers to evaluate and think through the effect of a proposed change on the schedule and the budget.
项目设计涉及的不仅仅是正确的决策。项目设计还充当系统组装说明。打个比方,你会购买一套设计精良的宜家家具,却没有组装说明手册吗?无论这件物品多么舒适或方便,只要想到要猜测数十个销钉、螺栓、螺钉和板子的位置和顺序,你就会感到畏缩。
Project design involves much more than just proper decision making. Project design also serves as the system assembly instructions. To use an analogy, would you buy a well-designed IKEA furniture set without the assembly instructions booklet? Regardless of how comfortable or convenient the item is, you would recoil at the mere thought of trying to guess where each of the dozens of pins, bolts, screws, and plates go, and in which order.
您的软件系统比家具复杂得多,但架构师通常认为开发人员和项目经理可以随心所欲地组装系统,并在过程中解决问题。这种临时方法显然不是组装系统的最有效方法。正如您将在下一章中看到的那样,项目设计改变了这种情况,因为要知道交付系统需要多长时间以及需要多少成本,唯一的方法是先弄清楚如何构建它。因此,每个项目设计选项都带有自己的一套组装说明。
Your software system is significantly more complex than furniture, yet often architects presume developers and projects managers can just go about assembling the system, figuring it out as they go along. This ad hoc approach is clearly not the most efficient way of assembling the system. As you will see in the next chapters, project design changes the situation, since the only way to know how long it will take and how much it will cost to deliver the system is to figure out first how you will build it. Consequently, each project design option comes with its own set of assembly instructions.
1943 年,亚伯拉罕·马斯洛发表了一部关于人类行为的重要著作,即马斯洛需求层次理论。1马斯洛根据人类需求的相对重要性对其进行了排序,并指出,只有当一个人满足了较低层次的需求后,他才会对满足较高层次的需求产生兴趣。这种层次方法可以描述另一类复杂的事物——软件项目。图 6-1以金字塔的形式显示了软件项目的需求层次结构。
In 1943, Abraham Maslow published a pivotal work on human behavior, known as Maslow’s hierarchy of needs.1 Maslow ranked human needs based on their relative importance and suggested that only once a person has satisfied a lower-level need could that person develop an interest in satisfying a higher-level need. This hierarchal approach can describe another category of complex beings—software projects. Figure 6-1 shows a software project’s hierarchy of needs in the shape of a pyramid.
图 6-1软件项目需求层次
Figure 6-1 Software project hierarchy of needs
1. A.H.马斯洛,《人类动机理论》,《心理学评论》 50,第4期(1943年):370-396。
1. A. H. Maslow, “A Theory of Human Motivation,” Psychological Review 50, no. 4 (1943): 370–396.
项目需求可分为五个层次:物理、安全、可重复性、工程和技术。
The project needs can be classified into five levels: physical, safety, repeatability, engineering, and technology.
物质需求。这是项目需求金字塔中最低的一层,与物质生存有关。就像人必须有空气、食物、水、衣服和住所一样,项目必须有工作场所(即使是虚拟的)和可行的商业模式。项目必须有计算机来编写和测试代码,以及指派人员执行这些任务。项目必须有适当的法律保护。项目不得侵犯现有的外部知识产权 (IP),但也必须保护自己的知识产权。
Physical. This is the lowest level in the project pyramid of needs, dealing with physical survival. Much as a person must have air, food, water, clothing, and shelter, a project must have a workplace (even a virtual one) and a viable business model. The project must have computers to write and test the code, as well as people assigned to perform these tasks. The project must have the right legal protection. The project must not infringe on existing external intellectual property (IP), yet must also protect its own IP.
安全性。一旦满足了物质需求,项目必须有足够的资金(通常以分配的资源形式)和足够的时间来完成工作。工作本身必须在可接受的风险下进行——不能太安全(因为低风险项目可能不值得做)也不能太冒险(因为高风险项目可能会失败)。简而言之,项目必须相当安全。项目设计在需求金字塔的这个层次上进行。
Safety. Once the physical needs are satisfied, the project must have adequate funding (often in the form of allocated resources) and enough time to complete the work. The work itself must be performed with acceptable risk—not too safe (because low-risk projects are likely not worth doing) and not too risky (because high-risk projects will likely fail). In short, the project must be reasonably safe. Project design operates at this level in the pyramid of needs.
可重复性。项目可重复性描述了开发组织一次又一次成功交付的能力,是控制和执行的基础。它确保如果您计划并承诺一定的时间表和成本,您将兑现这些承诺。可重复性体现了团队和项目的可信度。要获得可重复性,您必须管理需求,管理和跟踪项目相对于计划的进度,采用单元和系统测试等质量控制措施,拥有有效的配置管理系统,并积极管理部署和运营。
Repeatability. Project repeatability describes the development organization’s ability to deliver successfully time and again, and is the foundation for control and execution. It assures that if you plan for and commit to a certain schedule and cost, you will deliver on those commitments. Repeatability captures the credibility of the team and the project. To gain repeatability, you must manage requirements, manage and track the project’s progress against the plan, employ quality control measures such as unit and system testing, have an effective configuration management system, and actively manage deployment and operations.
工程。一旦确保了项目工作的可重复性,软件项目就可以首次将注意力转向软件工程的诱人方面。这包括架构和详细设计、质量保证活动(如根本原因分析和纠正(在系统层面))以及使用强化操作程序的预防工作。本书的第一部分专门讨论系统设计,它就处于金字塔的这一层。
Engineering. Once the repeatability of the project effort is secured, the software project can, for the first time, turn its attention to the enticing aspects of software engineering. This includes architecture and detail design, quality assurance activities such as root cause analysis and correction (on a systemic level), and preventive work using hardened operating procedures. The first part of this book, which was devoted to system design, operates at this level of the pyramid.
技术。这一层级包括开发技术、工具、方法、操作系统以及相关的核心技术方面。这是需求金字塔的顶峰,只有当较低的需求得到充分满足时,它才能充分发挥其潜力。
Technology. At this level are development technology, tools, methodology, operating systems, and related hard-core technical aspects. This is the very pinnacle of the pyramid of needs, and it can express itself to its full potential only once the lower needs are fully addressed.
在需求层次中,较高层次的需求服务于较低层次的需求。例如,根据马斯洛的观点,食物是比就业更低层次的需求,但大多数人工作是为了吃饭,而不是吃饭是为了工作。同样,技术服务于工程需求(例如建筑),工程需求服务于安全需求(项目设计提供的需求)。这也意味着,按时间顺序,你必须先设计系统;只有这样,你才能设计项目来构建它。
In a hierarchy of needs, higher-level needs serve the lower-level needs. For example, according to Maslow, food is a lower-level need than employment, yet most people work so that they can eat, as opposed to eat so that they can work. Similarly, the technology serves the engineering needs (such as the architecture), and the engineering needs serve the safety needs (those that project design provides). It also means that chronologically you have to design the system first; only then can you design the project to build it.
您可以通过列出典型软件项目成功所需的所有必要因素来验证金字塔。然后,您可以对它们进行优先排序、排序,最后将它们分组到需求类别中。
You can validate the pyramid by listing all necessary ingredients for success of a typical software project. You then prioritize, sort, and finally group them into categories of needs.
作为此过程中的一项实验,请考虑以下两个项目。第一个项目的设计紧密耦合,维护成本高,重用程度低,并且难以扩展。但是,有足够的时间来完成工作,并且项目人员配备适当。第二个项目具有令人惊叹的架构,它是模块化的、可扩展的和可重用的;满足所有要求;并且面向未来。但是,团队人手不足,即使有人员可用,也没有足够的时间来安全地开发系统。问问自己:您想参与哪个项目?
As an experiment in this process, consider the following two projects. The first has a tightly coupled design, a high cost of maintenance, and a low level of reuse, and is difficult to extend. However, there is adequate time to perform the work, and the project is properly staffed. The second project has an amazing architecture that is modular, extensible, and reusable; addresses all the requirements; and is future-proof. However, the team is understaffed, and even if the people were available, there is not enough time to safely develop the system. Ask yourself: Which project you would like to be part of?
答案毫无疑问是第一个项目。因此,项目设计在需求金字塔中的排名必须低于架构(即更基础)。许多软件项目失败的一个典型原因是需求金字塔倒置。想象一下图 6-1颠倒过来。开发团队几乎只关注技术、框架、库和平台;几乎没有在架构和设计上花费任何资金;完全忽略了时间、成本和风险等基本问题。这使得需求金字塔不稳定,这样的项目失败也就不足为奇了。通过使用项目设计工具投资金字塔的安全级别,您可以分层项目的需求,提供稳定上层的基础,并推动项目走向成功。
The answer is unequivocally the first project. Consequently, project design must rank lower (i.e., be more foundational) in the pyramid of needs than architecture. A classic reason for failure for many software projects is an inverted pyramid of needs. Imagine Figure 6-1 turned on its head. The development team focuses almost exclusively on technology, frameworks, libraries, and platforms; spends next to nothing on architecture and design; and completely ignores the fundamental issues of time, cost, and risk. This makes the pyramid of needs unstable, and it is small wonder that such projects fail. By investing in the safety level of the pyramid using the tools of project design, you stratify the needs of the project, provide the foundation that stabilizes the upper levels, and drive the project to success.
以下概述介绍了设计项目时应用的基本方法和技术。良好的项目设计包括人员配备计划、范围和工作量估算、服务的构建和集成计划、详细的活动时间表、成本计算、计划的可行性和验证以及执行和跟踪的设置。
The following overview describes the basic methodology and techniques you apply when designing a project. A good project design includes your staffing plan, scope and effort estimations, the services’ construction and integration plan, the detailed schedule of activities, cost calculation, viability and validation of the plan, and setup for execution and tracking.
本章涵盖了项目设计的大部分概念,同时将某些细节和一两个关键概念留到后面的章节中。然而,尽管它只是一个概述,但本章包含了成功设计和交付软件项目的所有基本要素。它还为设计活动提供了开发过程动机,而其余章节则更具技术性。
This chapter covers most of these concepts of project design, while leaving certain details and one or two crucial concepts for later chapters. However, even though it serves as a mere overview, this chapter contains all the essential elements of success in designing and delivering software projects. It also provides the development process motivation for the design activities, while the rest of the chapters are more technical in nature.
在继续阅读之前,您必须了解项目设计关乎成功以及成功所需的条件。整个软件行业的业绩记录非常糟糕,以至于该行业已经改变了其对成功的定义:如今,成功被定义为任何不会让公司破产的事情。由于门槛如此之低,从低质量到欺骗性的数字和沮丧的客户,任何事情都可能发生,而且什么都无关紧要。我对成功的定义不同,尽管它本身也是一个低门槛。我将成功定义为履行承诺。
Before you continue reading, you must understand that project design is about success and what it takes to succeed. The software industry at large has had such a poor track record that the industry has changed its very definition of success: Success today is defined as anything that does not bankrupt the company right now. With such a low bar, literally anything goes and nothing matters, from low quality to deceiving numbers and frustrated customers. My definition of success is different, though it is also a low bar in its own way. I define success as meeting your commitments.
如果你要求该项目耗时一年,成本为 100 万美元,我预计该项目耗时一年,而不是两年,项目成本为 100 万美元,而不是 300 万美元。在软件行业,许多人缺乏达到这一低成功标准所需的技能和培训。本章中提出的想法就是为了实现这一点。
If you call for a year for the project and $1 million in costs, I expect the project to take one year, not two, and for the project to cost $1 million, not $3 million. In the software industry, many people lack the skills and training it takes to meet even this low bar for success. The ideas presented in this chapter are all about accomplishing just that.
更高的标准是以最快、最便宜、最安全的方式交付项目。这样的更高标准需要以下章节中描述的技术。您可以进一步提高标准,要求系统架构在几十年内保持良好状态,并在其整个漫长而繁荣的生命周期中保持可维护、可重用、可扩展和安全。这不可避免地需要本书第一部分的设计思想。因为一般来说,你需要先学会走路,然后才能跑步,所以最好从基本的成功开始,然后逐步提高。
A higher bar is to deliver the project the fastest, least costly, and safest way. Such a higher bar requires the techniques described in the following chapters. You can raise the bar even further and call for having the system architecture remain good for decades and be maintainable, reusable, extensible, and secure across its entire long and prosperous life. That would inevitably require the design ideas of the first part of this book. Since, in general, you need to walk before you can run, it is best to start with the basic level of success and work your way up.
本书的第 1 部分阐述了一个通用的设计规则:特性始终是集成而非实现的方面。因此,早期的服务中都没有特性。在某个时候,您将集成到足以开始看到特性。我将这个点称为系统。系统不太可能出现在项目的最后,因为可能会有一些额外的结束活动,如系统测试和部署。系统通常出现在最后,因为它需要大多数服务以及客户端。当使用方法时,这意味着只有在管理器、引擎、资源访问和实用程序内部集成后,您才能支持客户端所需的行为。
Part 1 of this book stated a universal design rule: Features are always and everywhere aspects of integration, not implementation. As such, there are no features in any of the early services. At some point you will have integrated enough to start seeing features. I call that point the system. The system is unlikely to appear at the very end of the project since there may be some additional concluding activities such as system testing and deployment. The system typically appears toward the end because it requires most of the services as well as the clients. When using The Method, this means only once you have integrated inside the Managers, the Engines, the ResourceAccess, and the Utilities can you support the behaviors that the Clients require.
虽然系统是集成的产物,但并非所有集成都发生在管理器内部。有些集成发生在管理器完成之前(例如引擎集成ResourceAccess),有些集成发生在管理器之后(例如客户端和管理器之间)。还可能存在显式集成活动,例如针对模拟器开发服务客户端,然后将客户端与实际服务集成。
While the system is the product of the integration, not all the integration happens inside the Managers. Some integration happens before the Managers are complete (such as the Engines integrating ResourceAccess) and some integration happens after the Managers (such as between the Clients and the Managers). There might also be explicit integration activities, such as developing a client of a service against a simulator and then integrating the client with the real service.
系统只在项目结束时出现的问题在于管理层的反对。大多数负责管理软件开发的人都不理解本书中的设计概念,只是想要功能。他们永远不会停下来想一想,如果一个功能可以尽早快速出现,那么它就不会为企业或客户增加太多价值,因为公司或团队没有在该功能上花费太多精力。通常,管理层使用功能作为衡量进度和成功的指标,并倾向于取消没有进展的病态项目。因此,该项目面临着严重的风险:它可能完全按计划进行,但由于系统只在最后出现,如果项目根据功能来报告进度,那么它就是在要求取消。解决方案很简单:
The problem with the system appearing only toward the end of the project is pushback from management. Most people tasked with managing software development do not understand the design concepts in this book and simply want features. They would never stop to think that if a feature can appear early and quickly, then it does not add much value for the business or the customers because the company or the team did not spend much effort on the feature. Usually, management uses features as the metric to gauge progress and success, and tends to cancel sick projects that do not show progress. As such, the project faces a serious risk: It could be perfectly on schedule but because the system only appears at the end, if the project bases its progress report on features, it is asking to be canceled. The solution is simple:
永远不要根据功能来报告进度。始终根据集成来报告进度。
Never base progress reports on features. Always base progress reports on integration.
基于方法的项目在项目过程中会进行大量集成。这些集成规模小且可行。因此,项目有可能不断传出好消息,建立信任并避免取消。
A Method-based project performs a lot of integration along the project. These integrations are small and doable. As a result, there is the potential for a constant stream of good news coming out of the project, building trust and avoiding cancellation.
良好的架构不会自行形成,不会偶然出现,也不会在合理的时间或成本内自然出现。良好的架构是软件架构师深思熟虑的结果。因此,任何软件项目的第一个明智之举就是为项目指派一名合格且称职的架构师。除此之外别无他法,因为任何项目的主要风险都是没有架构师负责架构。这种风险远远超过项目面临的任何其他初始风险。无论开发人员的技术敏锐度如何,技术有多成熟,开发环境有多完善,都无关紧要。如果系统设计有缺陷,所有这些都将毫无意义。用房子来打比方,你想用最好的材料、最好的施工队、在最好的位置建造一栋房子,但没有任何建筑或建筑有缺陷吗?
A good architecture does not happen on its own, is not a happenstance, and does not emerge organically in any reasonable amount of time or cost. Good architecture is the result of deliberate effort by the software architect. As such, the first act of wisdom in any software project is to assign a qualified and competent architect to the project. Nothing less will do, because the principal risk in any project is not having an architect accountable for the architecture. This risk far eclipses any other initial risk the project faces. It does not matter what the level of the developers’ technical acumen is, how mature the technology is, or how pampering the development environment is. None of these will amount to anything if the system design is flawed. To use the house analogy, would you like to build a house from the best material, with the best construction crew, at the best location, but without any architecture or with a flawed architecture?
架构师需要花时间收集和分析需求,确定核心用例和易变性领域,并进行系统和项目设计。虽然设计本身并不耗时(架构师通常可以在一两周内设计系统和项目),但可能需要几个月的时间才能达到架构师可以设计系统和项目的程度。
The architect will need to spend time gathering and analyzing the requirements, identifying the core use cases and the areas of volatility, and producing the system and the project design. While the design itself is not time-consuming (the architect can usually design both the system and the project in a week or two), it may take several months to get to the point that the architect can design the system and the project.
大多数经理都会对花费三四个月的时间进行设计或完全跳过设计感到不安。他们可能希望通过让更多架构师参与来加速设计工作。然而,需求分析和架构是需要深思熟虑、耗时的活动。分配更多架构师对这些活动的监督并不能加速这些活动,反而会使情况变得更糟。架构师通常是自信的高级人员,习惯于独立工作。指派多个架构师只会导致他们相互竞争,而不是在系统和项目设计蓝图上竞争。
Most managers will recoil both at spending some three or four months on design and at skipping design entirely. They may wish to accelerate the design effort by having more architects participate. However, requirements analysis and architecture are contemplative, time-consuming activities. Assigning more architects to these activities does not expedite them at all, but instead will make matters worse. Architects are typically senior self-confident personnel, used to working independently. Assigning multiple architects only results in them contesting with each other, rather than in system and project design blueprints.
解决多架构师冲突的一种方法是任命一个设计委员会。不幸的是,任命一个委员会来监督它是毁掉任何东西的最可靠方法。另一种选择是分割系统并指派每个架构师设计一个特定的区域。如果采用这种方式,系统最终可能会变成奇美拉——一种希腊神话中的野兽,有狮子的头、龙的翅膀、牛的前腿和山羊的后腿。虽然奇美拉的每个部分都设计精良,甚至经过高度优化,但奇美拉在尝试的任何事情上都表现不佳:它飞得不如龙,跑得不如狮子,拉得不如牛,爬得不如山羊。奇美拉缺乏设计完整性——当多个架构师设计系统时也是如此,每个人都负责自己的部分。
One way of resolving the multiple architects conflict is to appoint a design committee. Unfortunately, the surest way of killing anything is to appoint a committee to oversee it. Another option is to carve up the system and assign each architect a specific area to design. With this option, the system is likely to end as a Chimera—a mythological Greek beast that has the head of a lion, the wings of a dragon, the front legs of an ox, and the hind legs of a goat. While each part of the Chimera is well designed and even highly optimized, the Chimera is inferior at anything it attempts: It does not fly as well as a dragon, run as fast as a lion, pull as much as an ox, or climb as well as a goat. The Chimera lacks design integrity—and the same is true when multiple architects design the system, each responsible for their part.
单一架构师对于设计完整性而言绝对至关重要。您可以将此观察结果扩展到一般规则,即实现设计完整性的唯一方法是让单一架构师负责设计。反之亦然:如果没有一个人负责设计并能够从头到尾可视化设计,则系统将不具备设计完整性。
A single architect is absolutely crucial for design integrity. You can extend this observation to the general rule that the only way to allow for design integrity is to have a single architect own the design. The opposite is also true: If no single person owns the design and can visualize it cover-to-cover, the system will not have design integrity.
此外,由于有多个架构师,因此没有人负责中间环节、跨子系统甚至跨服务的设计方面。因此,没有人对整个系统设计负责。当没有人对某件事负责时,这件事就永远不会完成,或者最多只能做得很差。
Additionally, with multiple architects no one owns the in-betweens, the cross-subsystem or even cross-services design aspects. As a result, no one is accountable for the system design as a whole. When no one is accountable for something, it never gets done, or at best is done poorly.
如果只有一位架构师负责,那么该架构师将负责系统设计。归根结底,负责任是赢得管理层尊重和信任的唯一途径。尊重总是源于负责任。当没有人负责任时,就像一群架构师的情况一样,管理层本能地对架构师及其设计工作只会嗤之以鼻。
With a single architect in charge, that architect is accountable for the system design. Ultimately, being accountable is the only way to earn the respect and trust of management. Respect always emerges out of accountability. When no one is accountable, as is the case with a group of architects, management intuitively has often nothing but scorn for the architects and their design effort.
大多数软件项目只需要一名架构师。无论项目规模如何,这都是正确的,并且对于成功至关重要。但是,大型项目很容易让架构师承担各种责任,从而使架构师无法专注于设计系统的关键目标,也无法防止设计在开发过程中偏离目标。此外,架构师的角色还包括技术领导、需求审查、设计审查、系统中每个服务的代码审查、设计文档更新、讨论营销部门的功能请求等。
Most software projects need only a single architect. This is true regardless of the project size and is essential for success. However, large projects very easily saturate the architect with various responsibilities, preventing the architect from focusing on the key goal of designing the system and keeping the design from drifting away during development. Additionally, the role of architect involves technical leadership, requirements review, design review, code review for each service in the system, design documents updates, discussion of feature requests from marketing, and so on.
管理层可以通过为项目指派一名(或多名)初级架构师来解决这种超负荷问题。架构师可以将许多次要任务分派给初级架构师,让架构师在开始时专注于系统和项目的设计,并在整个项目过程中保持系统符合其设计。架构师和初级架构师不太可能发生竞争,因为毫无疑问谁是负责人,而且职责界限明确。拥有初级架构师也是为组织培养和指导下一代架构师的好方法。
Management can address this overload by assigning a junior architect (or more than one) to the project. The architect can offload many secondary tasks to the junior architect, allowing the architect to focus on the design of the system and the project at the beginning and on keeping the system true to its design throughout the project. The architect and the junior architect are unlikely to compete because there is no doubt who is in charge, and there are clearly delineated lines of responsibilities. Having junior architects is also a great way of grooming and mentoring the next generation of architects for the organization.
尽管架构师对项目至关重要,但架构师不能孤立地工作。在第一天,项目必须有一个核心团队。核心团队由三个角色组成:项目经理、产品经理和架构师。这些是逻辑上的角色,可能对应也可能不对应三个人。如果没有对应,您可能会看到同一个人既是架构师又是项目经理,或者一个项目有多个产品经理。
As vital as the architect is to the project, the architect cannot work in isolation. On day 1, the project must have a core team in place. The core team consists of three roles: project manager, product manager, and architect. These are logical roles and may or may not map to three individuals. When they do not, you may see the same person as both the architect and the project manager, or a project with several product managers.
大多数组织和团队都有这些角色,但他们使用的职位名称可能不同。我将这些角色定义如下:
Most organizations and teams have these roles, but the job titles they use may be different. I define these roles as follows:
项目经理。项目经理的工作是保护团队不受组织的影响。大多数组织,即使是小组织,也会产生太多噪音。如果这些噪音进入开发团队,就会使团队陷入瘫痪。优秀的项目经理就像一道防火墙,可以阻挡噪音,只允许经过批准的沟通通过。项目经理跟踪进度并向管理层和其他项目经理报告状态,协商条款,并处理跨组织约束。在内部,项目经理将工作项目分配给开发人员,安排活动,并确保项目按时、按预算、按质量进行。除了项目经理之外,组织中的任何人都不应分配工作活动或向开发人员询问状态。
The project manager. The job of the project manager is to shield the team from the organization. Most organizations, even small ones, create too much noise. If that noise makes its way into the development team, it can paralyze the team. A good project manager is like a firewall, blocking the noise, allowing only sanctioned communication through. The project manager tracks progress and reports status to management and other project managers, negotiates terms, and deals with cross-organization constraints. Internally, the project manager assigns work items to developers, schedules activities, and keeps the project on schedule, on budget, and on quality. No one in the organization other than the project manager should assign work activity or ask for status from developers.
产品经理。产品经理应该囊括客户。客户也是一个持续的噪音源。产品经理充当客户的代理。例如,当架构师需要明确所需的行为时,架构师不应该追逐客户;相反,产品经理应该提供答案。产品经理还解决客户之间的冲突(通常表现为相互排斥的需求)、协商需求、定义优先级,并传达关于什么是可行的以及在什么条件下可行的期望。
The product manager. The product manager should encapsulate the customers. Customers are also a constant source of noise. The product manager acts as a proxy for the customers. For example, when the architect needs to clarify the required behaviors, the architect should not chase customers; instead, the product manager should provide the answers. The product manager also resolves conflicts between customers (often expressed as mutually exclusive requirements), negotiates requirements, defines priorities, and communicates expectations about what is feasible and on what terms.
架构师。架构师是技术经理,担任项目的设计负责人、流程负责人和技术负责人。架构师不仅设计系统,还要监督整个开发过程。架构师需要与产品经理合作完成系统设计,与项目经理合作完成项目设计。虽然与产品经理和项目经理的合作至关重要,但架构师要对这两项设计工作负责。作为流程负责人,架构师必须确保团队逐步构建系统,遵循系统和项目设计,并坚持不懈地保证质量。作为技术负责人,架构师通常必须决定完成技术任务的最佳方式(做什么),而将细节(如何做)留给开发人员。这需要持续的实践指导、培训和评审。
The architect. The architect is the technical manager, acting as the design lead, the process lead, and the technical lead of the project. The architect not only designs the system, but also sees it through development. The architect needs to work with the product manager to produce the system design and with the project manager to produce the project design. While the collaboration with both the product manager and the project manager is essential, the architect is held responsible for both of these design efforts. As a process lead, the architect has to ensure the team builds the system incrementally, following the system and the project design with a relentless commitment to quality. As a technical lead, the architect often has to decide on the best way of accomplishing technical tasks (the what-to-do) while leaving the details (the how-to-do) to developers. This requires continuous hands-on mentoring, training, and reviews.
核心团队定义中最明显的遗漏可能就是开发人员。开发人员(和测试人员)是跨项目来来去去的临时资源——这是本章在讨论活动安排和资源分配时重新讨论的一个非常重要的点。
Perhaps the most glaring omission from this definition of the core team are developers. Developers (and testers) are transient resources that come and go across projects—a very important point that this chapter revisits as part of the discussion of scheduling activities and resource assignment.
与开发人员不同,核心团队会一直参与整个项目,因为项目从头到尾都需要这三个角色。但是,这些角色在项目中的作用会随着时间的推移而变化。例如,项目经理从与利益相关者协商转变为提供状态报告,产品经理从收集需求转变为执行演示。架构师从设计系统和项目转变为提供持续的技术和流程领导,例如在服务级别进行设计和代码审查以及解决技术冲突。
Unlike developers, the core team stays throughout the project since the project needs all three roles from beginning to end. However, what these roles do in the project changes over time. For example, the project manager shifts from negotiating with stakeholders to providing status reports, and the product manager shifts from gathering requirements to performing demos. The architect shifts from designing the system and the project to providing ongoing technical and process leadership, such as conducting design and code reviews at the service level and resolving technical conflicts.
核心团队最初的任务是设计项目。这意味着要可靠地回答需要多长时间以及需要花费多少的问题。没有项目设计就不可能知道这些关键问题的答案,而要设计项目,就需要架构。在这方面,架构只是实现项目设计这一目的的一种手段。由于架构师需要与产品经理合作进行架构,与项目经理合作进行项目设计,因此项目在开始阶段就需要核心团队。
The mission of the core team at the beginning is to design the project. This means reliably answering the questions of how long it will take and how much it will cost. It is impossible to know the answers to these key questions without project design, and to design the project you require the architecture. In this respect, the architecture is merely a means to an end: project design. Since the architect needs to work with the product manager on the architecture and with the project manager on the project design, the project requires the core team at the beginning of the project.
核心团队在开发前期设计项目。模糊前端是所有技术项目中的通用术语1,指项目一开始。前端始于某人对项目有想法时,终于开发人员开始构建时。前端通常比大多数人意识到的要长得多:当他们参与项目时,前端可能已经进行了好几年。项目之间存在很大程度的差异,这导致前端的确切持续时间模糊不清。前端的持续时间主要取决于对项目施加的限制。项目的限制越多,您需要在前端花费的时间就越少。相反,限制越少,您就应该投入越多的时间来弄清楚未来会发生什么以及如何去做。
The core team designs the project in the fuzzy front end leading to development. The fuzzy front end is a general term1 in all technical projects referring to the very start of the project. The front end commences when someone has an idea about the project, and it concludes when developers start construction. The front end often lasts considerably longer than most people recognize: By the time they become involved in the project, the front end may have been in progress for several years. There is a large degree of variance across projects, which leads to the fuzziness about the exact duration of the front end. The duration of the front end is most heavily dependent on the constraints applied to the project. The more constrained the project is, the less time you need to spend in the front end. Conversely, the fewer the constraints, the more time you should invest in figuring out what lies ahead and how to go about it.
1. https://en.wikipedia.org/wiki/Front_end_innovation
1. https://en.wikipedia.org/wiki/Front_end_innovation
软件项目从来都不是没有约束的。所有项目都面临着时间、范围、工作量、资源、技术、遗留问题、业务环境等方面的约束。这些约束可以是显性的,也可以是隐性的。花时间验证显性约束和发现隐性约束至关重要。设计违反约束的系统和项目注定会失败。根据我的经验,软件项目应该将整个项目持续时间的大约 15% 到 25% 的时间花在前端,具体取决于约束。
Software projects are never constraint-free. All projects face some constraints on time, scope, effort, resources, technology, legacy, business context, and so on. These constraints can be explicit or implicit. It is vital to invest the time in both verifying the explicit constraints and discovering the implicit constraints. Designing a system and project that violates a constraint is a recipe for failure. From my experience, a software project should spend roughly between 15% and 25% of the entire duration of the project in the front end, depending on constraints.
如果不知道项目的真实进度、成本和风险,就批准该项目是毫无意义的。毕竟,你不会在不知道房子多少钱的情况下买房。你不会买一栋你负担得起但无法支付维护和税费的房子。在任何行业,很明显,你只有在了解范围后才会投入时间和资金。许多软件项目在不了解实际所需时间和成本的情况下鲁莽地继续进行。
It is pointless to approve a project without knowing its true schedule, cost, and risk. After all, you would not buy a house without knowing how much it costs. You would not buy a house that you can afford up front but whose upkeep and taxes you cannot pay. In any walk of life, it is obvious that you commit time and capital only after the scope is known. Many software projects recklessly proceed with no idea of the real time and cost required.
在组织承诺项目并确定有足够的时间和资金之前,为项目配备资源也是毫无意义的。事实上,在做出承诺之前就为项目配备人员往往会不顾经济承受能力强行推进项目。如果正确的做法是从一开始就避免开展项目,那么组织只会浪费大量资金。匆忙投入资源几乎总是伴随着糟糕的功能设计和毫无计划——这几乎不是成功的要素。
It is just as pointless to staff a project with resources before the organization is committed to the project and certain to have the required time and money. In fact, staffing a project before the commitment is made has a tendency to force the project ahead regardless of affordability. If the right thing to do is to avoid doing the project in the first place, the organization will only be wasting good money. A rush to commit the resources will almost always be accompanied by a poor functional design and no plan at all—hardly the ingredients of success.
成功的关键在于根据合理的设计和范围计算做出明智的决策。一厢情愿的想法不是策略,直觉也不是知识,尤其是在处理复杂的软件系统时。
The key to success is to make educated decisions, based on sound design and scope calculations. Wishful thinking is not a strategy, and intuition is not knowledge, especially when dealing with complex software systems.
项目设计的结果是一组计划,而不是单个计划。如上一章所述,项目计划不是时间和成本的单一坐标。构建任何系统总是有多种可能的方式,只有一种选择才能提供时间、成本和风险的正确组合。架构师可能会倾向于简单地询问管理层项目的设计参数是什么,然后只设计那个单一选项。问题是,管理人员通常不会说出他们的意思,也不会真正表达他们的意思。
The result of project design is a set of plans, not a single plan. As described in the previous chapter, the project plan is not a single coordinate of time and cost. There are always multiple possible ways of building any system, and only one option will offer the right combination of time, cost, and risk. The architect may be tempted to simply ask management what the design parameters of the project are and just design that single option. The problem is that managers often do not say what they mean or mean what they say.
例如,考虑一个 10 人年的项目,即所有活动的工作量总和为 10 人年的项目。假设管理层要求以成本最低的方式构建系统。这样的项目将让一个人工作 10 年,但管理层不太可能愿意等待 10 年。现在假设管理层要求以最快的方式构建系统。想象一下,可以通过让 3650 人工作 1 天(甚至让 365 人工作 10 天)来构建相同的系统。管理层不太可能在这么短的时间内雇用这么多人。同样,管理层永远不会要求以最安全的方式构建系统(因为任何值得做的事情都需要风险,而安全的项目不值得做)或故意选择最冒险的方式开展项目。
For example, consider a 10-man-year project—that is, a project where the sum of effort across all activities is 10 man-years. Suppose management asks for the least costly way of building the system. Such a project would have one person working for 10 years, but management is unlikely to be willing to wait 10 years. Now suppose that management asks for the quickest possible way to build the system. Imagine it is possible to build the same system by engaging 3650 people for 1 day (or even 365 people for 10 days). Management is unlikely to hire so many people for such short durations. Similarly, management will never ask for the safest way of building the system (because anything worth doing requires risk, and safe projects are not worth doing) or knowingly go for the riskiest way of doing the project.
解决管理层真正想要什么的模糊性的唯一方法是提供一系列好的选项供他们选择,每个选项都是时间、成本和风险的可行组合。你可以在一个非正式的称为“喂我/杀了我”的专门会议上向管理层介绍这些选项。顾名思义,此会议的目的是让管理层选择其中一个项目设计方案并投入所需资源(“喂我”路线)。选项之一始终是不做这个项目(“杀了我”路线)。正式的会议名称应该是软件开发计划评审,或 SDP 评审。如果你的流程没有 SDP 评审点也没关系:只需召开会议(没有经理可以拒绝主题为“软件开发计划评审”的会议请求)。
The only way to resolve the ambiguity about what management really wants is to present a buffet of good options from which to choose, with each option being a viable combination of time, cost, and risk. You present these options to management in a dedicated meeting unofficially called the Feed Me/Kill Me meeting. As the name implies, the purpose of this meeting is for management to choose one of the project design options and commit the required resources (the “Feed Me” route). One of the options is always that of not doing the project (the “Kill Me” route). Officially, the name of the meeting should be the Software Development Plan Review, or SDP review. It makes no difference if your process does not have an SDP review point: Just call a meeting (no manager can refuse a meeting request whose subject line is “Software Development Plan Review”).
一旦确定了所需的选项,管理层必须签署 SDP 文件。该文件现在成为您项目的人寿保险单,因为只要您不偏离计划的参数,就没有理由取消您的项目。这确实需要适当的跟踪(如附录 A所述)和项目管理。
Once the desired option is identified, management must literally sign off on the SDP document. This document now becomes your project’s life insurance policy because, as long as you do not deviate from the plan’s parameters, there is no reason to cancel your project. This does require proper tracking (as described in Appendix A) and project management.
如果没有可行的选择,那么你需要做出正确的决定——在这种情况下,就是终止项目。一个注定失败的项目,一个从一开始就没有得到足够时间和资源的项目,对任何人都没有好处。该项目最终会耗尽时间或金钱,或两者兼而有之,组织不仅会浪费资金和时间,还会浪费将这些资源投入另一个可行项目的机会成本。参与一个永远没有机会的项目对核心团队成员的职业生涯也有不利影响。由于你只有几年的时间来取得成就并向前迈进,所以每个项目都必须有意义,成为你的荣誉。花一两年时间在一个失败的横向行动上会限制你的职业前景。在开发开始之前终止这样的项目对所有相关人员都有好处。
If no option is palatable, then you need to drive the right decision—in this case, killing the project. A doomed project, a project that from inception did not receive adequate time and resources, will do no one any good. The project will eventually run out of time or money or both, and the organization will have wasted not just the funds and time but the opportunity cost of devoting these resources to another doable project. It is also detrimental to the careers of the core team members to be on a project that never has a chance. Since you have only a few years to make your mark and move ahead, every project must count and be a feather in your cap. Spending a year or two on a sideways move that failed will limit your career prospects. Killing such a project before development starts is beneficial for all involved.
有了项目设计(但只有在管理层选择了特定选项之后),团队就可以开始构建系统了。通常,这需要将服务(或模块、组件、类等)分配给开发人员。确切的分配方法值得在本章后面单独讨论。现在,认识到您应该始终以 1:1 的比例将服务分配给开发人员。1:1 的比例并不意味着开发人员只负责一项服务,而是如果您在任何时候对团队进行横断面分析,您将看到开发人员只负责一项服务。对于开发人员来说,这完全没问题完成一项服务并转到下一项服务。但是,您永远不应该看到开发人员同时处理多项服务或多个开发人员同时处理同一项服务。任何其他将服务分配给开发人员的方式都会导致失败。糟糕的分配选项的示例包括:
With the project design in hand (but only after management has chosen a specific option), the team can start constructing the system. Typically this requires assigning services (or modules, components, classes, etc.) to developers. The exact assignment methodology deserves a section on its own later on in the chapter. For now, recognize that you should always assign services to developers in a 1:1 ratio. The 1:1 ratio does not mean that a developer works on only one service, but rather that if you do a cross-section of the team at any moment in time, you will see a developer working on one and only one service. It is perfectly fine for a developer to finish one service and move to the next. However, you should never see a developer working on more than one service at a time or more than one developer working concurrently on the same service. Any other way of assigning services to developers will result in failure. Examples of the poor assignment options include:
每项服务有多名开发人员。为一项服务指派两名(或更多)开发人员的动机不是开发人员过剩,而是希望尽快完成工作。但是,两个人实际上不可能同时处理同一件事,因此必须使用一些子方案:
– 序列化。开发人员可以按顺序工作,这样每次只有一个开发人员在处理服务。由于上下文切换开销(即需要弄清楚自当前开发人员上次查看服务以来发生了什么),这会花费更长的时间。这违背了最初分配两名开发人员的目的。
– 并行化。开发人员可以并行工作,然后整合他们的工作。这种方案将比仅让一名开发人员开发服务花费更长的时间。例如,假设一项估计需要一个月时间的服务被分配给两名并行工作的开发人员。人们可能会认为这项工作将在两周后完成,但这是一个错误的假设。首先,并非所有工作单元都可以以这种方式划分。其次,开发人员必须至少再分配一周时间来整合他们的工作。如果开发人员并行工作并且在开发过程中没有协作,那么这种整合根本无法保证成功。即使可以进行整合,由于整合更改,每个部分的所有测试工作也会失效。测试整个服务也需要额外的时间。总而言之,这项工作将至少需要一个月(可能更长)。同时,正在开发依赖服务并期望服务在两周后准备就绪的其他开发人员将进一步延迟。
Multiple developers per service. The motivation for assigning two (or more) developers to one service is not a surplus of developers, but rather the desire to complete the work sooner. However, two people cannot really work on the same thing at the same time, so some subscheme must be used:
– Serialization. The developers could work serially so that only one of them is working on the service at a time. This takes longer due to the context switch overhead—that is, the need to figure out what happened with the service since the current developer looked at it last. This defeats the purpose of assigning the two developers in the first place.
– Parallelization. The developers could work in parallel and then integrate their work. This scheme will take much longer than just having a single developer working on the service. For example, suppose a service estimated as one month of effort is assigned to two developers who will work in parallel. One might be tempted to assume that the work will be complete after two weeks, but that is a false assumption. First, not all units of work can be split this way. Second, the developers would have to allocate at least another week to integrate their work. This integration is not at all guaranteed to succeed if the developers worked in parallel and did not collaborate during development. Even if the integration is possible, it would void all the testing effort that went into each part due to the integration changes. Testing the service as a whole also would require additional time. In all, the effort will take at least a month (and likely more). Meanwhile, other developers who are working on dependent services and expect the service to be ready after two weeks will be further delayed.
每位开发人员提供多项服务。将两项(或更多)服务分配给一位开发人员同样糟糕。假设两项服务A和B,估计每项服务需要一个月的工作时间,分配给一位开发人员,开发人员预计在一个月后完成这两项服务。由于工作总和为两个月,不仅服务在一个月后未完成,而且完成它们将需要更长的时间。当开发人员在开发A服务时,开发人员没有在开发B服务,导致依赖该服务的开发人员B要求开发人员在B服务上工作。开发人员可能会切换到该B服务,但依赖该A服务的人会要求开发人员关注。所有这些来回切换都极大地降低了开发人员的效率,导致工期延长到两个多月。最终,可能三四个月后,A服务B才可能完成。
Multiple services per developer. The option of assigning two (or more) services to a single developer is just as bad. Suppose two services, A and B, each estimated as a month of work, are assigned to a single developer, with the developer expected to finish both after a single month. Since the sum of work is two months, not only will the services be incomplete after one month, but finishing them will take much longer. While the developer is working on the A service, the developer is not working on the B service, causing the developers dependent on the B service to demand that the developer work on the B service. The developer might switch to the B service, but then those dependent on the A service would demand some attention. All this switching back and forth drastically reduces the developer’s efficiency, prolonging the duration to much more than two months. In the end, perhaps after three or four months, the A and B services may be complete.
无论是为每项服务分配多名开发人员,还是为每位开发人员分配多项服务,都会导致项目出现一连串延迟,这主要是由于延迟的依赖关系影响了其他开发人员。这反过来又使得准确的估算变得非常困难。唯一具有某种责任感并有可能达到估算的选项是按 1:1 的比例将服务分配给开发人员。
Either assigning more than one developer per service or assigning multiple services per developer causes a mushroom cloud of delays to propagate throughout the project, mostly due to delayed dependencies affecting other developers. This, in turn, makes accurate estimations very difficult. The only option that has any semblance of accountability and a chance of meeting the estimation is a 1:1 assignment of services to developers.
当使用 1:1 的比例将服务分配给开发人员时,服务之间的交互与开发人员之间的交互是同构的。考虑图 7-1。
When using 1:1 assignments of services to developers, it follows that the interaction between the services is isomorphic to the interaction between the developers. Consider Figure 7-1.
图 7-1系统的设计是团队的设计。(图片:Sapann Design/Shutterstock)
Figure 7-1 The system’s design is the team’s design. (Images: Sapann Design/Shutterstock)
服务之间的关系、交互和通信决定了开发人员之间的关系和交互。当使用 1:1 分配时,系统的设计就是团队的设计。
The relationship between the services, their interactions and communication, dictates the relationships and interactions between the developers. When using 1:1 assignment, the design of the system is the design of the team.
接下来,考虑图 7-2。虽然服务的数量及其大小与图 7-1相比没有变化,但没有人可以说这是一个好的设计。
Next, consider Figure 7-2. While the number of services and their size has not changed from Figure 7-1, no one could claim it is a good design.
图 7-2紧耦合的系统与团队
Figure 7-2 Tightly coupled system and team
良好的系统设计力求将模块之间的交互数量减少到最低限度——与图 7-2中的情况完全相反。如图 7-1所示的松耦合系统设计已将交互数量降至最低,以至于删除一个交互会使系统无法运行。
A good system design strives to reduce the number of interactions between the modules to their bare minimum—the exact opposite of what happens in Figure 7-2. A loosely coupled system design such as that in Figure 7-1 has minimized the number of interactions to the point that removing one interaction makes the system inoperable.
图 7-2中的设计显然是紧耦合的,它也描述了团队的运作方式。比较一下图 7-1和图 7-2中的团队。你更愿意加入哪个团队?图 7-2中的团队是一个高压力、脆弱的团队。团队成员可能有领土意识并抵制变化,因为每个变化都会产生连锁反应,扰乱他们的工作和其他人的工作。他们花费大量时间开会解决问题。相比之下,图 7-1中的团队可以在本地解决问题并控制它们。每个团队成员几乎都独立于其他成员,不需要花费太多时间协调工作。简而言之,图 7-1中的团队比图 7-2中的团队效率高得多。因此,拥有更好系统设计的团队更有可能在紧迫的最后期限前完成任务。
The design in Figure 7-2 is clearly tightly coupled, and it also describes the way the team operates. Compare the teams from Figure 7-1 and Figure 7-2. Which team would you rather join? The team in Figure 7-2 is a high-stress, fragile team. The team members are likely territorial and resist change because every change has ripple effects that disrupt their work and the work of everybody else. They spend an inordinate amount of time in meetings to resolve their issues. In contrast, the team in Figure 7-1 can address issues locally and contain them. Each team member is almost independent from the others and does not need to spend much time coordinating work. Simply put, the team in Figure 7-1 is far more efficient than the team in Figure 7-2. As a result, the team with the better system design has far better prospects of meeting an aggressive deadline.
最后这一点至关重要:大多数经理只是口头上支持系统设计,因为架构的好处(可维护性、可扩展性和可重用性)是长远利益。对于面临资源匮乏和时间紧迫的严酷现实的经理来说,未来的利益无济于事。如果有什么帮助的话,经理应该尽可能减少工作范围,以满足最后期限。由于系统设计对当前目标没有帮助,经理将放弃对设计的任何有意义的投资。遗憾的是,这样做,经理失去了履行承诺的所有机会,因为满足紧迫期限的唯一方法是采用世界一流的设计,从而产生最高效的团队。在努力获得管理层对你的设计工作的支持时,展示设计如何帮助实现眼前的目标。长期利益将由此产生。
This last observation is paramount: Most managers just pay lip service to system design because the benefits of architecture (maintainability, extensibility, and reusability) are down-the-road benefits. Future benefits do not help a manager who is facing the harsh reality of scant resources and a tight schedule. If anything, it behooves the manager to reduce the scope of work as much as possible to meet the deadline. Since system design is supposedly not helping with the current objectives, the manager will throw overboard any meaningful investment in design. Sadly, by doing so, the manager loses all chance of meeting the commitments, because the only way to meet an aggressive deadline is with a world-class design that yields the most efficient team. When striving to get management support for your design effort, show how design helps with the immediate objective. The long-term benefits will flow out of that.
虽然设计影响团队效率的方式可能是不言而喻的,但团队也会影响设计。在图 7-1中,如果两个开发人员不互相交流,那么设计的这个领域就会很薄弱。您应该将两个耦合的服务分配给两个自然而有效地相互合作的开发人员。
While the way the design affects the team efficiency may be self-evident, the team also affects the design. In Figure 7-1, if two developers do not talk with each other, then that area of the design will be weak. You should assign two coupled services to two developers who naturally work effectively with each other.
分配服务(或 UI 开发等活动)时,尽量保持任务连续性,即分配给每个人的任务之间的逻辑连续性。通常,此类任务分配遵循服务依赖关系图。如果服务A依赖于服务B,则分配A给开发人员B。一个优点是A已经熟悉的开发人员B需要更少的启动时间。保持任务连续性的一个重要但经常被忽视的优势是项目和开发人员的成功标准是一致的。开发人员有动力在完成工作时做好充分的工作,B以避免在需要做工作时遭受损失A。完美的任务连续性几乎不可能实现,但它应该是目标。
When assigning services (or activities such as UI development), try to maintain task continuity, a logical continuation between tasks assigned to each person. Often, such task assignments follow the service dependency graph. If service A depends on service B, then assign A to the developer of B. One advantage is that the A developer who is already familiar with B needs less ramp-up time. An important, yet often overlooked advantage of maintaining task continuity is that the project and the developer’s win criteria are aligned. The developer is motivated to do an adequate job on B to avoid suffering when it is time to do A. Perfect task continuity is hardly ever possible, but it should be the goal.
最后,在分配任务时要考虑开发人员的个人技术倾向。例如,让安全专家设计 UI、让数据库专家实现业务逻辑或让初级开发人员实现消息总线或诊断等实用程序可能效果不佳。
Finally, take the developers’ personal technical proclivities into account when making assignments. For example, it will likely not work well to have the security expert design the UI, to have the database expert implement the business logic, or to have junior developers implement the utilities such as message bus or diagnostics.
工作量估算是您尝试回答某件事需要多长时间的问题的方法。有两种类型的估算:单个活动估算(估算分配给资源的活动的工作量)和总体项目估算。这两种类型的估算毫无关联,因为项目的总持续时间不是所有活动的工作量总和除以资源数量。这是由于利用人员的固有低效率、活动之间的内部依赖性以及您可能需要实施的任何风险缓解措施。
Effort estimation is how you try to answer the question of how long something will take. There are two types of estimations: individual activity estimation (estimating the effort for an activity assigned to a resource) and overall project estimation. The two types of estimations are unrelated, because the overall duration of the project is not the sum of effort across all activities divided by number of resources. This is due to the inherent inefficiency in utilizing people, the internal dependencies between activities, and any risk mitigation you may need to put in place.
在许多软件团队中,进行估算充其量只是一种很好的仪式,最坏的情况则是徒劳无功。软件行业估算结果不佳的原因如下:
In many software teams, engaging in estimations is at best a nice ritual and at worst an exercise in futility. The poor results of estimations in the software industry are due to several reasons:
活动耗时不确定,甚至活动清单不确定,是估算准确性差的主要原因。不要混淆因果关系:不确定性是原因,估算准确性差是结果。您必须主动减少不确定性,如本章后面所述。
Uncertainty in how long activities take, and even uncertainty in the list of activities, is the primary reason for poor accuracy of estimations. Do not confuse cause and effect: The uncertainty is the cause, and poor estimation accuracy is the result. You must proactively reduce the uncertainty, as described later in this chapter.
软件开发人员很少接受过简单有效的估算技术培训。大多数人只能依靠偏见、猜测和直觉。
Few people in software development are trained in simple and effective estimation techniques. Most are left to rely on bias, guesswork, and intuition.
许多人为了弥补不确定性而高估或低估,但这会导致更糟糕的结果。
Many people overestimate or underestimate in an attempt to compensate for the uncertainty, which results in far worse outcomes.
大多数人在列出活动时往往只看冰山一角。当然,如果你忽略了对成功至关重要的活动,你的估算就会出错。无论是忽略整个项目的活动,还是忽略活动内部阶段,都是如此。例如,估算人员可能只列出编码活动,或者在编码活动内部,只考虑编码而不考虑设计或测试。
Most people tend to look at just the tip of the iceberg when listing activities. Naturally, if you omit activities that are essential to success, your estimations will be off. This is true both when omitting activities across the project and when omitting internal phases inside activities. For example, estimators may list just the coding activities or, inside coding activities, account for coding but not design or testing.
正如刚才提到的,人们倾向于高估或低估,以试图弥补不确定性。这两者都是项目成功的关键。
As just mentioned, people tend to overestimate and underestimate in an attempt to compensate for uncertainty. Both of these are deadly when it comes to project success.
根据帕金森定律,高估永远行不通。2例如,如果你给开发人员三周时间来完成一项需要两周才能完成的工作,那么开发人员将有两周的时间不做这项工作,然后闲置一周。相反,开发人员将花三周时间从事这项工作。由于实际工作只占用了三周中的两周时间,因此在额外的一周时间里,开发人员将从事镀金工作——添加没有人需要或想要的花哨功能、方面和功能,而这些功能和功能并不是设计的一部分。这种镀金工作大大增加了任务的复杂性,而增加的复杂性大大降低了成功的可能性。因此,开发人员需要四到六周的时间才能完成最初的任务。项目中的其他开发人员原本希望在三周后收到代码,但现在他们的工作也耽误了。此外,现在团队拥有的代码模块可能已经存在多年,并且跨越了多个版本,但它比最初应该有的复杂得多。
Overestimation never works because of Parkinson’s law.2 For example, if you give a developer three weeks to perform a two-week activity, the developer will simply not work on it for two weeks and then be idle for a week. Instead, the developer will work on the activity for three weeks. Since the actual work consumed only two of those three weeks, in the extra week the developer will engage in gold plating—adding bells and whistles, aspects, and capabilities that no one needs or wants, and that were not part of the design. This gold plating significantly increases the complexity of the task, and the increased complexity drastically reduces the probability of success. Consequently, the developer labors for four or six weeks to finish the original task. Other developers in the project, who expect to receive the code after three weeks, are now delayed, too. Furthermore, the team now owns, perhaps for years and across multiple versions, a code module that is needlessly more complex than what it should have been in the first place.
2. Cyril N. Parkinson,《帕金森定律》,《经济学人》 (1955 年 11 月 19 日)。
2. Cyril N. Parkinson, “Parkinson’s Law,” The Economist (November 19, 1955).
低估同样会导致失败。毫无疑问,给开发人员两天时间来完成为期两周的编码活动将使任何金光闪闪的事情都化为泡影。问题是开发人员会试图快速而粗略地完成活动,偷工减料,无视所有已知的最佳实践。这就像要求外科医生快速而粗略地为您做手术或要求承包商快速而粗略地建造房屋一样明智。
Underestimation guarantees failure just as well. Undoubtedly, giving a developer two days to perform a two-week coding activity will preclude any gold plating. The problem is that the developer will try to do the activity quick-and-dirty, cutting corners and disregarding all known best practices. This is as sensible as asking a surgeon to operate on you quick-and-dirty or a contractor to build a house quick-and-dirty.
遗憾的是,任何复杂的任务都不能快速完成。取而代之的是快速干净和脏乱慢两种选择。由于开发人员缺乏软件开发中的所有最佳实践,从测试到详细设计再到文档,开发人员现在正试图以最糟糕的方式执行任务。因此,假设工作正确完成,开发人员将不会在原本可能花费的两周时间内完成该活动,而会因为质量低下和复杂性增加而花费四到六周(或更长时间)。与高估一样,项目中其他开发人员原本希望在预定的两天后完成代码,但结果却被大大推迟了。此外,团队现在必须拥有一个以最糟糕的方式完成的代码模块,也许需要数年时间,跨越多个版本。
Sadly, there is no quick-and-dirty with any intricate task. Instead, the two options are quick-and-clean and dirty-and-slow. Because the developer is missing all the best practices in software development, from testing to detailed design to documentation, the developer is now trying to perform the task in the worst possible way. Consequently, the developer will not work on the activity for the nominal two weeks it could have taken, assuming the work was performed correctly, but will work on it for four or six (or more) weeks due to the low quality and increased complexity. As with overestimation, other developers in the project who expected the code after the scheduled two days are much delayed. Furthermore, the team now has to own, perhaps for years and across multiple versions, a code module that is done the worst possible way.
虽然这些结论可能符合常识,但许多人忽视了这些典型错误的严重程度。图 7-3以定性方式绘制了成功概率与估计的关系。例如,考虑一个为期 1 年的项目。在适当的架构和项目设计下,项目的正常估计时间为 1 年,如图 7-3N中的点所示。如果给这个项目一天的时间,成功的概率是多少?一周?一个月?显然,如果估计足够激进,成功的概率为零。6 个月呢?虽然为期 1 年的项目在 6 个月内完成的概率极低,但并不是零,因为也许会发生奇迹。如果估计为 11 个月零 3 周,成功的概率实际上非常高,对于 11 个月来说也相当高。但是,项目不太可能在 9 个月内完成。因此,在正常估计的左侧是一个临界点,成功概率在此以非线性方式大幅提高。同样,这个为期 1 年的项目可以持续 13 个月,甚至 14 个月也是合理的。但如果你给这个项目 18 或 24 个月的时间,你肯定会失败,因为帕金森定律会发挥作用:工作会扩大以填满分配的时间,项目会因为复杂性的增加而失败。因此,在正常估计的右侧存在另一个临界点,成功的可能性再次以非线性方式崩溃。
While these conclusions may make common sense, what many miss is the magnitude of these classic mistakes. Figure 7-3 plots in a qualitative manner the probability of success as a function of the estimation. For example, consider a 1-year project. With proper architecture and project design, the project’s normal estimation is 1 year, indicated by point N in Figure 7-3. What would be the probability of success if you give this project a day? A week? A month? Clearly, with sufficiently aggressive estimations, the probability of success is zero. How about 6 months? While the probability of a 1-year project completing in 6 months is extremely low, it is not zero because maybe a miracle will happen. The probability of success if you estimate at 11 months and 3 weeks is actually very high, and it is also fairly high for 11 months. However, it is unlikely the project can complete in 9 months. Therefore, to the left of the normal estimation is a tipping point where the probability of success drastically improves in a nonlinear way. Similarly, this 1-year project could last 13 months, and even 14 months is reasonable. But if you give this project 18 or 24 months, you will surely kill it because Parkinson’s law will kick in: Work will expand to fill the allotted time, and the project will fail due to the increased complexity. Therefore, another tipping point exists to the right of the normal estimation, where the probability of success again collapses in a nonlinear way.
图 7-3成功概率与估计的关系 [摘自 Steve McConnell 的《快速开发》(微软出版社,1996 年)并经过修改。]
Figure 7-3 Probability of success as a function of estimation [Adopted and modified from Steve McConnell, Rapid Development (Microsoft Press, 1996).]
图 7-3说明了良好的名义估计至关重要,因为它们以非线性的方式最大化了成功概率。过去,当你低估和高估时,你很可能会伤害自己和他人。这些不仅是常见的经典错误,而且是根本性错误。
Figure 7-3 illustrates the paramount importance of good nominal estimations because they maximize the probability of success, in a nonlinear way. In the past, you were likely to hurt yourself and others when you both underestimated and overestimated. These are not just common, classic mistakes—they are cardinal mistakes.
尽管数十年来,软件行业已经出现了一套行之有效的估算技术,并且已在多个其他行业中广泛使用,但软件行业在估算方面的糟糕记录仍然存在。我还没有看到一个团队在正确地进行估算的同时,项目设计和承诺也偏离了目标。本节并不试图回顾所有这些技术,而是重点介绍我多年来发现的一些最简单、最有效的想法和技术。
The poor track record with estimations in the software industry persists even though a decent set of effective estimation techniques have been available for decades and across multiple other industries. I have yet to see a team that has practiced estimations correctly and was also off the mark with their project design and commitments. Instead of trying to review all of these techniques, this section highlights some of the ideas and techniques I have found over the years to be the most simple and effective.
好的估算是准确的,但不精确。例如,考虑一个实际耗时 13 天的活动,有 2 个估算:10 天或 23.8 天。虽然第二个估算要精确得多,但显然第一个估算更好,因为它更准确。对于估算,准确性比精确度更重要。由于大多数软件项目在交付时会大大偏离其承诺(有时是初始估算的几倍),因此,如果参与这些项目的人员将活动估算到小时或天,这是毫无意义的。
Good estimations are accurate, but not precise. For example, consider an activity that actually took 13 days and had 2 estimations: 10 days or 23.8 days. While the second estimation is far more precise, clearly the first estimation is better because it is more accurate. With estimations, accuracy counts more than precision. Since most software projects significantly veer off from their commitments at delivery (sometimes by multiples of the initial estimations), it is nonsensical when the people involved those projects estimate the activities down to the hour or the day.
估算还必须与跟踪分辨率相匹配。如果项目经理按周跟踪项目,任何少于一周的估算都是没有意义的,因为它小于测量分辨率。这样做的意义就如同使用卷尺进行实际测量时将房屋大小估算到微米一样。
Estimations also must match the tracking resolution. If the project manager tracks the project on a weekly basis, any estimation less than a week is pointless because it is smaller than the measurement resolution. Doing so makes as much sense as estimating the size of your house down to the micron when using a measuring tape for the actual measurement.
即使某项活动实际持续 13 天,最好将其估算为 15 天而不是 12.5 天。任何规模适中的项目都可能有几十项活动;如果选择准确,您可能会对某些活动估计过高(一点),而对其他活动估计过低(一点)。但平均而言,您的估算将相当准确。如果您试图做到精确,您可能会积累错误,因为您不允许估算中的错误相互抵消。此外,如果您要求人们进行精确估算,他们会无休止地苦苦思索和深思熟虑。如果您要求进行准确的估算,估算将很容易、简单且快速地进行。
Even when an activity is actually 13 days in duration, it is better to estimate it as 15 days rather than 12.5 days. Any decent-size project will likely have several dozens of activities; by opting for accuracy, you will probably overestimate (a little) on some activities and underestimate (a little) on others. On average, though, your estimations will be fairly accurate. If you are trying to be precise, you can accumulate errors because you do not allow for errors in the estimations to cancel each other out. In addition, if you ask people for precise estimations, they will endlessly agonize and deliberate on them. If you ask for accurate estimations, the estimations will be easy, simple, and quick to make.
不确定性是导致估计失误的主要原因。重要的是不要将未知与不确定相混淆。例如,虽然我确切的死亡日期未知,但远非不确定,整个行业(人寿保险)都基于对该日期的估计能力。虽然对于我本人而言,估计可能不准确,但人寿保险行业有足够的客户使其足够准确。
Uncertainty is the leading cause of missed estimations. It is important not to confuse the unknown with the uncertain. For example, while the exact day of my demise is unknown, it is far from uncertain, and a whole industry (life insurance) is based on the ability to estimate that date. While the estimation may not be precise when it comes to me specifically, the life insurance industry has sufficient customers to make it accurate enough.
当要求人们进行估算时,你应该帮助他们克服对估算的恐惧。许多人过去可能都曾因为估算不准确而遭到别人的反对。你甚至可能会遇到以“我不知道”或“估算永远行不通”的形式拒绝估算的人。这种态度可能表明他们害怕陷入困境,或试图避免估算的努力,或对估算技术一无所知且缺乏经验,而不是根本无法估算。
When asking people to estimate, you should help them overcome their fear of estimations. Many may have had their poor estimations used against them in the past. You may even encounter refusal to estimate in the form of “I don’t know” or “Estimations never work.” Such attitudes may indicate fear of entrapment, or trying to avoid the effort of estimating, or being ignorant and inexperienced in estimation techniques, rather than a fundamental inability to estimate.
面对不确定性,请采取以下步骤:
Confronted with the uncertain, take these steps:
首先询问量级:活动更像是一天、一周、一个月还是一年?知道量级后,使用 2 倍的因子将其缩小。例如,如果第一个问题的答案是单位类型为一个月,请询问它更像是两周、一个月、两个月还是四个月。第一个答案排除了八个月(因为作为量级,这更像是一年),并且它不可能是一周,因为一开始就没有提供作为量级的答案。
Ask first for the order of magnitude: Is the activity more like a day, a week, a month, or a year? With the magnitude known, narrow it down using factor of 2 to zoom in. For example, if the answer to the first question was a month as the type of unit, ask if it is more like two weeks, one month, two months, or four months. The first answer rules out eight months (since that is more like a year as an order of magnitude), and it cannot be one week because that was not provided in the first place as an order of magnitude.
明确列出项目中不确定的领域并集中精力进行估算。始终将大型活动分解为更小、更易于管理的活动,以大大提高估算的准确性。
Make an explicit effort to list the areas of uncertainty in the project and focus on estimating them. Always break down large activities into smaller, more manageable activities to greatly increase the accuracy of the estimations.
投资探索性发现工作,这将深入了解问题的本质并减少不确定性。回顾团队或组织的历史,并从您自己的历史中了解过去事情花了多长时间。
Invest in an exploratory discovery effort that will give insight into the nature of the problem and reduce the uncertainty. Review the history of the team or the organization, and learn from your own history how long things have taken in the past.
专门处理高不确定性的估算技术之一是项目评估与审查技术 (PERT)。3对于每项活动,您都要提供三个估算:最乐观、最悲观和最有可能。最终估算由以下公式提供:
One estimation technique dealing specifically with high uncertainly is part of Program Evaluation and Review Technique (PERT).3 For every activity, you provide three estimations: the most optimistic, the most pessimistic, and the most likely. The final estimation is provided by this formula:
3. https://en.wikipedia.org/wiki/Program_evaluation_and_review_technique
3. https://en.wikipedia.org/wiki/Program_evaluation_and_review_technique
在哪里:
where:
E是计算出的估计值。
E is the calculated estimation.
O是乐观的估计。
O is the optimistic estimation.
M是最有可能的估计。
M is the most likely estimation.
P是悲观的估计。
P is the pessimistic estimation.
例如,如果某项活动的乐观估计为 10 天,悲观估计为 90 天,最可能估计为 25 天,则该活动的 PERT 估计时间为 33.3 天:
For example, if an activity has an optimistic estimation of 10 days, a pessimistic estimation of 90 days, and a most likely estimation of 25 days, the PERT estimation for it would be 33.3 days:
对整个项目进行估算主要用于项目设计验证,但在启动项目设计时也会很有用。完成详细的项目设计后,将其与总体项目估算进行比较。两者不必完全匹配,但应该一致并相互验证。例如,如果详细项目设计为 13 个月,而总体项目估算为 11 个月,则详细项目设计有效。但如果总体估算为 18 个月,则至少有一个数字是错误的,您必须调查差异的来源。在处理前期约束很少的项目时,您也可以利用总体项目估算。这种干净的画布项目有很多未知数,因此很难设计。您可以使用总体项目估算来反向工作以将某些活动纳入其中,作为启动项目设计过程的一种方式。
Estimating the project as a whole is useful primarily for project design validation, but can also be beneficial when initiating project design. When you finish the detailed project design, compare it to the overall project estimation. The two need not match perfectly but should be congruent and validate each other. For example, if the detailed project design was 13 months and the overall project estimation was 11 months, then the detailed project design is valid. But if the overall estimation was 18 months, then at least one of these numbers is wrong, and you must investigate the source of the discrepancy. You can also utilize the overall project estimation when dealing with a project with very few up-front constraints. Such a clean canvas project has a great deal of unknowns, making it difficult to design. You can use the overall project estimation to work backward to box in certain activities as a way of initiating the project design process.
对于整个项目估算,您的过往记录和历史最为重要。即使只有适度的可重复性(见图6-1),您也不可能比组织过去的类似项目更快或更慢地交付项目。吞吐量和效率的主要因素是组织的性质,即其独特的成熟度指纹,它不会在一夜之间或项目之间发生变化。如果您的公司过去需要一年时间才能交付类似的项目,那么将来也需要一年时间。也许这个项目在其他地方可以用六个月完成,但在您的公司却需要一年时间。不过,这里有一些好消息:可重复性还意味着公司可能不会花两三年的时间来完成该项目。
With overall project estimation, your track record and history matter the most. With even a modest degree of repeatability (see Figure 6-1), it is unlikely that you could deliver the project faster or slower than similar projects in the organization’s past. The dominant factor in throughput and efficiency is the organization’s nature, its own unique fingerprint of maturity, which is something that does not change overnight or between projects. If it took your company a year to deliver a similar project in the past, then it will take it a year in the future. Perhaps this project could be done in six months somewhere else, but with your company it will take a year. There is some good news here, though: Repeatability also means the company likely will not take two or three years to complete the project.
一个很棒的但鲜为人知的总体项目估算方法是利用项目估算工具。这些工具通常假设规模和成本之间存在某种非线性关系,例如幂函数,并使用大量先前分析过的项目作为训练数据。有些工具甚至使用蒙特卡洛模拟根据项目属性或历史记录缩小变量范围。我已经使用这类工具几十年了,它们能产生准确的结果。
A great yet little-known technique for overall project estimation is leveraging project estimation tools. These tools typically assume some nonlinear relationship exists between size and cost, such as a power function, and use a large number of previously analyzed projects as their training data. Some tools even use Monte Carlo simulations to narrow down the range of the variables based on your project attributes or historical records. I have used such tools for decades, and they produce accurate results.
宽带估算是我对宽带 Delphi 4估算技术的改编。宽带估算使用多个单独的估算来确定整个项目估算的平均值,然后在其上方和下方添加一个估算带。您可以使用该带外的估算来深入了解项目的性质并改进估算,重复此过程,直到该带和项目估算收敛。
The broadband estimation is my adaptation of the Wideband Delphi4 estimation technique. The broadband estimation uses multiple individual estimations to identify the average of the overall project estimation, then adds a band of estimations above and below it. You use the estimations outside the band to gain insight into the nature of the project and refine the estimations, repeating this process until the band and the project estimations converge.
4. Barry Boehm,《软件工程经济学》(Prentice Hall,1981)。
4. Barry Boehm, Software Engineering Economics (Prentice Hall, 1981).
要开始任何宽带估算工作,首先要召集一大群项目利益相关者,从开发人员到测试人员、经理,甚至支持人员——群体的多样性是宽带技术的关键。争取将新手和老手、唱反调者、专家和通才、创意人士和工蜂组合在一起。您希望利用团队的知识、智慧、经验、直觉和风险评估的协同作用。一个好的团队规模在 12 到 30 人之间。使用少于 12 名参与者是可能的,但统计因素可能不足以产生良好的结果。如果参与者超过 30 名,则很难在一次会议中完成估算。
To start any broadband estimation effort, first assemble a large group of project stakeholders, ranging from developers to testers, managers, and even support people—diversity of the group is key with the broadband technique. Strive for a mix of newcomers and veterans, devil’s advocates, experts and generalists, creative people, and worker bees. You want to tap into the group’s synergy of knowledge, intelligence, experience, intuition, and risk assessment. A good group size is between 12 and 30 people. Using fewer than 12 participants is possible, but the statistical element may not be strong enough to produce good results. With more than 30 participants, it is difficult to finish the estimation in a single meeting.
会议开始时,简要介绍一下项目的当前状态和阶段、已经完成的工作(如架构)以及其他背景信息(如系统的操作概念),这些信息可能不为核心团队以外的利益相关者所知。每个参与者都需要为项目估算两个数字:需要几个月的时间以及需要多少人。让估算人员将这些数字连同他们的姓名一起写在便笺上。收集笔记,将其输入电子表格中,然后计算每个值的平均值和标准差。现在,确定与平均值至少相差一个标准差的估算值(时间和人员)——即那些超出共识范围(因此得名)的值。这些是异常值。
Begin the meeting by briefly describing the current state and phase of the project, what you have already accomplished (such as architecture), and additional contextual information (such as the system’s operational concepts) that may not be known to stakeholders who were not part of the core team. Each participant needs to estimate two numbers for the project: how long will it take in months and how many people it will require. Have the estimators write these numbers, along with their name, on a note. Collect the notes, enter them in a spreadsheet, and calculate both the average and the standard deviation for each value. Now, identify the estimations (both in time and people) that were at least one standard deviation removed from the average—that is, those values outside the broadband of consensus (hence the name of the technique). These are the outliers.
不要从分析中剔除异常值(这是大多数统计方法的常见做法),而是征求产生异常值的人的意见——因为他们可能知道其他人不知道的事情。这是识别不确定性的好方法。一旦异常值说出了估计的理由,并且所有人都听到了,您就可以进行另一轮估计。重复此过程,直到所有估计都落在一个标准差内,或者偏差小于您的测量分辨率(例如一个人或一个月)。宽带估计通常会在第三轮以这种方式收敛。
Instead of culling the outliers from the analysis (the common practice in most statistical methods), solicit input from those who produced them—because they may know something that the others do not. This is a great way of identifying the uncertainties. Once the outliers have voiced their reasoning for the estimation and all have heard it, you conduct another round of estimations. You repeat this process until all estimations fall within one standard deviation, or the deviation is less than your measurement resolution (such as one person or one month). Broadband estimation typically converges this way by the third round.
总体项目估算,无论是使用历史记录、估算工具还是宽带方法,往往都是准确的,即使不是高度准确。您应该比较各种总体估算,以确保您确实一个很好的估计。不幸的是,虽然这些总体估计是准确的,但它们只是增强和验证了你详细的项目设计工作。它们只能起到强化和健全性检查的作用,因为它们本身不可行。你可能相当确定该项目需要 18 个月和 6 个人,但你还不知道如何利用这些资源按时完成项目。你必须设计项目才能了解这些信息。
Overall project estimation, whether done by using historical records, estimation tools, or the broadband method, tends to be accurate, if not highly accurate. You should compare the various overall estimations to ensure that you do, indeed, have a good estimation. Unfortunately, while these overall estimations are accurate, they merely augment and verify your detailed project design effort. They serve only as reinforcement and a sanity check because they are not actionable on their own. You may be fairly certain that the project requires 18 months and 6 people, but as yet you have no idea how to utilize those resources to finish the project on that schedule. You have to design the project to learn this information.
项目设计从项目中各个活动的预计持续时间开始。在估计各个活动之前,您必须准备一份项目所有活动的详细清单,包括编码和非编码活动。从某种意义上说,即使是这份活动清单也是对实际活动集的估计,因此,减少不确定性的相同原理在这里也适用。避免专注于系统架构所指示的结构编码活动的诱惑,积极地在水线以下查看冰山的全部范围。花时间寻找活动,并请其他人编制该清单,以便您可以将其与自己的清单进行比较。让同事审查、批评和质疑您的活动清单。您可能会对自己实际错过的内容感到惊讶。
You start the project design with the estimated duration of the individual activities in the project. Before you estimate individual activities, you must prepare a meticulous list of all activities in the project, both coding and noncoding activities alike. In a way, even that list of activities is an estimation of the actual set of activities, so the same rationale about reducing uncertainties holds true here. Avoid the temptation to focus on the structural coding activities indicated by the system architecture, and actively look below the waterline at the full extent of the iceberg. Invest time in looking for activities, and ask other people to compile that list so you could compare it with your own list. Have colleagues review, critique, and challenge your list of activities. You may be surprised by what you actually missed.
由于准确度高于精确度,因此最佳实践是在任何活动估算中始终使用 5 天的时间。需要 1 或 2 天的活动不应成为计划的一部分。需要 3 或 4 天的活动始终估计为 5 天。活动持续 5、10、15、20、25、30 或 35 天。估计为 40 天或更长时间的活动可能适合分解为较小的活动以减少不确定性。每个活动使用 5 天的时间可以使项目在周边界上很好地对齐,并减少活动前后几周的部分浪费。这种做法也符合现实生活——从来没有任何活动在星期五开始。
Since accuracy is superior to precision, a best practice is to always use a quantum of 5 days in any activity estimation. Activities that take 1 or 2 days should not be part of the plan. Activities that are 3 or 4 days are always estimated at 5 days. Activities are either 5, 10, 15, 20, 25, 30, or 35 days long. Activities estimated at 40 or more days may be good candidates to break down into smaller activities to reduce the uncertainty. Using 5 days for each activity aligns the project nicely on week boundaries and reduces waste of parts of weeks before or after an activity. This practice also matches real life—no activity has ever started on a Friday.
不确定性的减少甚至对常规规模的活动也有好处。强迫自己和其他人将每项活动分解为除编码之外的任务,例如学习曲线、测试客户端、安装、集成点、同行评审和文档。同样,通过避免专注于编码并检查未来的全部工作范围,您可以大大减少对单个活动估计的不确定性。
The reduction in uncertainty benefits even regular-size activities. Force yourself and others to break down each activity into tasks in addition to coding, such as learning curves, test clients, installation, integration points, peer reviews, and documentation. Again, by avoiding focusing on coding and examining the full scope of the work ahead, you greatly reduce the uncertainty of individual activity estimations.
如果你要求别人估算一项活动,你必须与他们保持正确的估算对话。永远不要用“你有两周时间”来规定持续时间!这不仅毫无根据,而且活动的所有者也不会觉得确保在两周内完成。如果人们不负责任,进度和质量都会受到影响。避免使用诱导性问题,例如“这需要两周时间,对吗?”虽然这比口述估计要好一些,但你现在会让对方偏向你的估计。即使对方同意,他或她仍然不会觉得对你的估计负责。一个更好的问题是开放式问题:“需要多长时间?”不要接受立即的答案。一定要强迫人们稍后再给你答案,因为你想让他们逐项列出真正涉及的内容,并反思和思考答案。你必须有好的估计,以最大限度地提高成功的可能性和人们的责任感(见图7-3)。
If you ask others to estimate an activity, you must maintain a correct estimation dialog with them. Never dictate duration by saying, “You have two weeks!” Not only is that based on nothing, but the owner of the activity also does not feel accountable to actually finish in two weeks. When people are unaccountable, progress and quality will be lacking. Avoid leading questions, such as “It is going to take two weeks, right?” While this is somewhat better than dictating the estimation, you now bias the other party toward your estimation. Even if the person agrees, he or she still will not feel accountable to your estimation. A far better question is the open question, “How long will it take?” Do not accept an immediate answer. Always force people to get back to you later with the answer because you want them to itemize what is really involved and to reflect and contemplate on the answer. You must have good estimations to maximize the probability of success and people’s accountability (see Figure 7-3).
要计算项目的实际持续时间以及项目的其他几个关键方面,您需要找到项目的关键路径。关键路径分析是最重要的项目设计技术。但是,如果没有以下先决条件,您就无法执行此分析:
To calculate the actual duration of a project as well several other key aspects of the project, you need to find the project’s critical path. Critical path analysis is the single most important project design technique. However, you cannot perform this analysis without the following prerequisites:
系统架构。您必须将系统分解为服务和其他构建块,例如客户端和管理器。虽然您可以设计一个架构糟糕的项目,但这肯定不是理想的选择。糟糕的系统设计会不断变化,您的项目设计也会随之改变。系统架构必须有效,这样才能经久不衰。
The system architecture. You must have the decomposition of the system into services and other building blocks such as Clients and Managers. While you could design a project with even a bad architecture, that is certainly less than ideal. A bad system design will keep changing, and with it, your project design will change. It is crucial that the system architecture be valid, so that it holds true over time.
所有项目活动的列表。您的列表必须包含编码和非编码活动。通过检查架构,可以很容易地得出大多数编码活动的列表。非编码活动的列表是如前所述获得的,也是业务性质的产物。例如,一家银行软件公司将有合规和监管活动。
A list of all project activities. Your list must contain both coding and noncoding activities. It is straightforward to derive the list of most coding activities by examining the architecture. The list of noncoding activities is obtained as discussed previously and is also a product of the nature of the business. For example, a banking software company will have compliance and regulatory activities.
活动工作量估算。对活动列表中每项活动的工作量进行准确估算。您应该使用多种估算技术来提高准确性。
Activity effort estimation. Have an accurate estimation of the effort for each activity in the list of activities. You should use multiple estimation techniques to drive accuracy.
服务依赖关系树。使用调用链来识别架构中各个服务之间的依赖关系。
Services dependency tree. Use the call chains to identify the dependencies between the various services in the architecture.
活动依赖关系。除了服务之间的依赖关系之外,您还必须编制一份列表,说明所有活动如何依赖其他活动(编码和非编码活动)。根据需要添加显式集成活动。
Activity dependencies. Beyond the dependencies between your services, you must compile a list of how all activities depend on other activities, coding and noncoding alike. Add explicit integration activities as needed.
规划假设。你必须了解项目可用的资源,或者更准确地说,你的计划所需的人员配置方案。如果你如果有多个这样的场景,那么你将针对每个可用性场景进行不同的项目设计。规划假设将包括项目哪个阶段需要哪种类型的资源。
Planning assumptions. You must know the resources available for the project or, more correctly, the staffing scenarios that your plan calls for. If you have several such scenarios, then you will have a different project design for each availability scenario. The planning assumptions will include which type of resource is required at which phase of the project.
您可以将项目中的活动以图形方式排列到网络图中。网络图显示项目中的所有活动及其依赖关系。首先,从调用链在系统中传播的方式得出活动依赖关系。对于已验证的每个用例,您应该有一个调用链或序列图,显示系统构建块之间的某些交互如何支持每个用例。如果一个图有Client A调用Manager A,而第二个图有Client A调用Manager B,则Client A依赖于Manager A和Manager B。通过这种方式,您可以系统地发现架构组件之间的依赖关系。图 7-4显示了基于方法的示例架构中代码模块的依赖关系图。
You can graphically arrange the activities in the project into a network diagram. The network diagram shows all activities in the project and their dependencies. You first derive the activity dependencies from the way the call chains propagate through the system. For each of the use cases you have validated, you should have a call chain or sequence diagram showing how some interaction between the system’s building blocks supports each use case. If one diagram has Client A calling Manager A and a second diagram has Client A calling Manager B, then Client A depends on both Manager A and Manager B. In this way, you systematically discover the dependencies between the components of the architecture. Figure 7-4 shows the dependency chart of the code modules in a sample Method-based architecture.
图 7-4服务依赖关系图
Figure 7-4 Services dependency chart
图 7-4中显示的依赖关系图有几个问题。首先,它结构性很强,缺少所有非结构化编码和非编码活动。其次,它在图形上很庞大,对于较大的项目来说,视觉上会变得过于拥挤和难以管理。第三,您应该避免将活动分组在一起,就像图中的实用程序一样。
The dependency chart shown in Figure 7-4 has several problems. First, it is highly structural and is missing all the nonstructural coding and noncoding activities. Second, it is graphically bulky and with larger projects would become visually too crowded and unmanageable. Third, you should avoid grouping activities together, as is the case with the Utilities in the figure.
您应该将图 7-4中的图表转换为图 7-5中所示的详细抽象图表。该图表现在包含所有活动,包括编码和非编码活动,例如架构和系统测试。您可能还想添加一个侧面图例来标识活动,以便于查看。
You should turn the diagram in Figure 7-4 into the detailed abstract chart shown in Figure 7-5. That chart now contains all activities, coding and noncoding alike, such as architecture and system testing. You may want to also add a side legend identifying the activities for easy review.
图 7-5项目网络
Figure 7-5 Project network
单凭一项活动的工作量估计并不能确定该活动何时完成:对其他活动的依赖关系也会发挥作用。因此,完成每项活动的时间是该活动的工作量估计加上在项目网络中到达该活动所需的时间的乘积。到达某项活动的时间,或准备开始从事该活动所需的时间,是所有通向该活动的网络路径中时间的最大值。用更正式的术语来说,i方式,您可以使用以下递归公式计算完成项目活动的时间:
The effort estimation for an activity alone does not determine when that activity will complete: Dependencies on other activities also come into play. Therefore, the time to finish each activity is the product of the effort estimation for that activity plus the time it takes to get to that activity in the project network. The time to get to an activity, or the time it takes to be ready to start working on the activity, is the maximum of time of all network paths leading to that activity. In a more formal manner, you calculate the time for completing activity i in the project with this recursive formula:
在哪里:
where:
Ti是完成活动的时间i。
Ti is the time for completing activity i.
Ei是活动的工作量估算i。
Ei is the effort estimation for activity i.
n是直接导致活动的活动数量i。
n is the number of activities leading directly to activity i.
前面每项活动的时间都以相同的方式解决。使用回归,您可以从项目中的最后一项活动开始,找到网络中每项活动的完成时间。例如,考虑图 7-6中的活动网络。
The time for each of the preceding activities is resolved the same way. Using regression, you can start with the last activity in the project and find the completion time for each activity in the network. For example, consider the activity network in Figure 7-6.
图 7-6时间计算示例中使用的项目网络
Figure 7-6 Project network used in the time calculation example
在图 7-6的图中,活动5是最后一个活动。因此,定义完成活动时间的回归表达式集5为:
In the diagram in Figure 7-6, activity 5 is the last activity. Thus, the set of regression expressions that define the time to finish activity 5 are:
请注意,完成活动的时间5取决于先前活动的工作量估计,也取决于网络拓扑。例如,如果图 7-6中的所有活动持续时间相等,则:
Note that the time to finish activity 5 depends on the effort estimation of the previous activities as much as it depends on the network topology. For example, if all the activities in Figure 7-6 are of equal duration, then:
但是,如果除活动之外的所有活动都6估计为 5 天,并且活动6估计为 20 天,那么:
However, if all activities except activity 6 are estimated at 5 days, and activity 6 is estimated at 20 days, then:
虽然您可以手动计算小型网络(如图 7-6所示)的活动时间,但对于大型网络,这种计算很快就会失控。计算机擅长处理回归问题,因此您应该使用工具(例如 Microsoft Project 或电子表格)来计算活动时间。
While you could manually calculate the activity times for small networks such as Figure 7-6, this calculation quickly gets out of hand with large networks. Computers excel at regression problems, so you should use tools (such as Microsoft Project or a spreadsheet) to calculate activity times.
通过计算活动时间,您可以确定活动网络中最长的可能路径。在这种情况下,最长的路径意味着持续时间最长的路径,而不一定是活动数量最多的路径。例如,图 7-7中的项目网络有 17 项活动,每项活动的预计持续时间都不同(图 7-7中的数字只是活动 ID;未显示持续时间)。
By calculating the activity times, you can identify the longest possible path in the network of activities. In this context, the longest path means the path with greatest duration, not necessarily the one with the greatest number of activities. For example, the project network in Figure 7-7 has 17 activities, each of different estimated duration (the numbers in Figure 7-7 are just the activity IDs; durations are not shown).
图 7-7识别关键路径
Figure 7-7 Identifying the critical path
根据对每项活动和依赖关系的工作量估计,使用前面给出的公式并从活动开始17,网络中最长的路径以粗体显示。网络中最长的路径称为关键路径。您应该使用不同的颜色或粗线在网络图中突出显示关键路径。计算关键路径是回答构建系统需要多长时间的问题的唯一方法。
Based on the effort estimation for each activity and the dependencies, using the formula given earlier and starting from activity 17, the longest path in the network is shown in bold. That longest path in the network is called the critical path. You should highlight the critical path in your network diagrams using a different color or bold lines. Calculating the critical path is the only way to answer the question of how long it will take to build the system.
由于关键路径是网络中最长的路径,因此它也代表最短的项目持续时间。关键路径上的任何延迟都会延迟整个项目并危及您的承诺。
Because the critical path is the longest path in the network, it also represents the shortest possible project duration. Any delay on the critical path delays the entire project and jeopardizes your commitments.
任何项目都不可能加速超越其关键路径。换句话说,您必须沿着其关键路径构建系统,以最快的方式构建系统。对于任何项目而言,无论技术、架构、开发方法、开发流程、管理风格和团队规模如何,都是如此。
No project can ever be accelerated beyond its critical path. Put another way, you must build the system along its critical path to build the system the quickest possible way. This is true in any project, regardless of technology, architecture, development methodology, development process, management style, and team size.
在任何涉及多个活动且由多人参与的项目中,您都会拥有一个包含关键路径的活动网络。关键路径并不关心您是否承认它;它就在那里。如果没有关键路径分析,开发人员沿着关键路径构建系统的可能性几乎为零。以这种方式工作可能会慢得多。
In any project with multiple activities on which multiple people are working, you will have a network of activities with a critical path. The critical path does not care if you acknowledge it or not; it is just there. Without critical path analysis, the likelihood of developers building the system along the critical path is nearly zero. Working this way is likely to be substantially slower.
在项目设计期间,架构师会将抽象资源(例如Developer 1)分配给每个项目设计选项。只有在决策者选择了特定的项目设计选项之后,项目经理才能分配实际资源。由于关键路径上的任何延迟都会导致项目延迟,因此项目经理应始终首先将资源分配给关键路径。您应该更进一步,始终将最佳资源分配给关键路径。我所说的“最佳”是指最可靠、最值得信赖的开发人员,他们不会失败。避免犯下经典错误,即首先将开发人员分配到高知名度但非关键的活动,或客户或管理层最关心的活动。首先将开发资源分配给非关键活动无助于加速项目进度。减慢关键路径的速度绝对会减慢项目进度。
During project design, the architect assigns abstract resources (such as Developer 1) to each of the project design options. Only after the decision makers have chosen a particular project design option can the project manager assign actual resources. Since any delay in the critical path will delay the project, the project manager should always assign resources to the critical path first. You should take matters a step further by always assigning your best resources to the critical path. By “best,” I mean the most reliable and trustworthy developers, the ones who will not fail to deliver. Avoid the classic mistake of first assigning developers to high-visibility but noncritical activities, or to activities that the customer or management care the most about. Assigning development resources first to noncritical activities does nothing to accelerate the project. Slowing down the critical path absolutely slows down the project.
在项目设计期间,对于每个项目设计选项,架构师需要确定项目总共需要多少资源(例如开发人员)。架构师会反复发现所需的人员配备水平。考虑图 7-7中的网络,其中已经确定了关键路径,并假设每个节点都是一个服务。项目第一天需要多少名开发人员?如果只给你一名开发人员,那么根据定义,该开发人员就是你最好的开发人员,因此这名开发人员将进入活动1。如果给你两名开发人员,那么您可以将第二位开发人员分配给活动2,即使该活动直到很晚才需要。如果您有三名开发人员,那么第三名开发人员最好是闲着,最坏的情况是打乱正在从事活动 的开发人员的工作1。因此,对于项目第一天需要多少名开发人员的问题,答案是最多两名开发人员。
During project design, for each project design option the architect needs to find out how many resources (such as developers) the project will require overall. The architect discovers the required staffing level iteratively. Consider the network in Figure 7-7, where the critical path is already identified, and assume each node is a service. How many developers are required on the first day of the project? If you were given just a single developer, that developer is by definition your best developer, so the single developer goes to activity 1. If you are given two developers, then you can assign the second developer to activity 2, even though that activity is not required until much later. If you are given three developers, then the third developer is at best idle, and at worst disrupting the developer working on activity 1. Therefore, the answer to the question of how many developers are required on day 1 of the project is at most two developers.
接下来,假设活动1已完成。现在需要多少名开发人员?答案最多是六名(可用的活动有3、4、、和)。但是5,要求六名开发人员并不理想,因为当您沿着关键路径进展到活动或 的级别时,您只需要三名甚至两名开发人员。也许活动完成后最好只要求四名开发人员而不是六名开发人员。只使用四名而不是六名开发人员有两个显著的优势。首先,您将降低项目成本。拥有四名开发人员的项目比拥有六名开发人员的项目便宜 33%。其次,由四名开发人员组成的团队比由六名开发人员组成的团队效率高得多。团队规模较小,沟通开销较少,也不容易受到闲人干扰。6728121
Next, suppose activity 1 is complete. How many developers are required now? The answer is at most six (activities 3, 4, 5, 6, 7, and 2 are available). However, asking for six developers is less than ideal since by the time you have progressed up the critical path to the level of activities 8 or 12, you need only three or even two developers. Perhaps it is better to ask for just four developers instead of six developers once activity 1 is complete. Utilizing only four as opposed to six developers has two significant advantages. First, you will reduce the cost of the project. A project with four developers is 33% less expensive than a project with six developers. Second, a team of four developers is far more efficient than a team of six developers. The smaller team will have less communication overhead and less temptation for interference from the idle hands.
仅基于这一标准,由三名甚至两名开发人员组成的团队都会比由四名开发人员组成的团队更好。但是,当检查图 7-7的网络时,很可能无法仅用三名开发人员构建系统并保持相同的工期。由于开发人员太少,您将陷入困境,其中关键路径上的开发人员需要尚未准备好的非关键活动(例如15需要活动的活动11)。这会将非关键活动提升为关键活动,实际上是创建了一条新的更长的关键路径。我将这种情况称为次关键人员配置。当项目进入次关键状态时,它将错过最后期限,因为旧的关键路径不再适用。
Based on this criterion alone, a team of three or even two developers would be better than a team of four developers. However, when examining the network of Figure 7-7 it is likely impossible to build the system with just three developers and keep the same duration. With so few developers, you will paint yourself into a corner in which a developer on the critical path needs a noncritical activity that is simply not ready yet (such as activity 15 needing activity 11). This promotes a noncritical activity to a critical activity, in effect creating a new and longer critical path. I call this situation subcritical staffing. When the project goes subcritical, it will miss its deadline because the old critical path no longer applies.
真正的问题不是需要多少资源。在项目的任何阶段都要问的问题是:
The real question is not how many resources are required. The question to ask at any point of the project is:
使项目能够沿着关键路径顺利进展的最低资源水平是多少?
What is the lowest level of resources that allows the project to progress unimpeded along the critical path?
找到最低级别的资源可以让项目在任何时候都拥有关键人员,并以最低成本和最高效的方式交付项目。请注意,关键人员配备级别可以且应该在项目整个生命周期内发生变化。
Finding this lowest level of resources keeps the project critically staffed at all points in time and delivers the project at the least cost and in the most efficient way. Note that the critical level of staffing can and should change throughout the life of the project.
想象一下一群没有项目设计的开发人员。该小组构成了在项目开发过程中畅通无阻所需的最低资源水平的可能性关键路径几乎为零。弥补项目未知的人员需求的唯一方法是使用极其浪费和低效的产能过剩人员。如前所述,这种方式不可能是完成项目的最快方式——现在您会看到,它也不是构建系统成本最低的方式。我的经验是,产能过剩的成本可能比最低成本水平高出许多倍。
Imagine a group of developers without project design. The likelihood of that group constituting the lowest level of resources required to progress unimpeded along the critical path is nearly zero. The only way to compensate for the unknown staffing needs of the project is by using horrendously wasteful and inefficient overcapacity staffing. As illustrated previously, working this way cannot be the fastest way of completing the project—and now you see it also cannot be the least costly way of building the system. My experience is that overcapacity can be more expensive than the lowest cost level by many multiples.
回到图 7-7中的网络,一旦你得出结论,你可以尝试只用四名开发人员构建系统,你就会面临一个新的挑战:你将在何时何地部署这四名开发人员?例如,活动1完成后,你可以将开发人员分配到活动3、4、5、6或3、5、6、7或3、4、6、2等等。即使是一个简单的网络,可能性的组合范围也是惊人的。这些选项中的每一个都有一组可能的下游分配。
Returning to the network in Figure 7-7, once you have concluded that you could try to build the system with only four developers, you face a new challenge: Where and when will you deploy these four developers? For example, with activity 1 complete, you could assign the developers to activities 3, 4, 5, 6 or 3, 5, 6, 7, or 3, 4, 6, 2, and so on. Even with a simple network, the combinatorial spectrum of possibilities is staggering. Each of these options would have its own set of possible downstream assignments.
幸运的是,您不必尝试任何这些组合。检查图 7-72中的活动。您实际上可以推迟将资源分配给活动,直到该活动(关键路径上)必须开始的那一天减去活动的预计持续时间。活动可以“浮动”到顶部(保持未分配且不开始),直到它与活动相撞。所有非关键活动都有浮动时间,即您可以在不延迟项目的情况下延迟完成它们的时间量。关键活动没有浮动时间(或更准确地说,它们的浮动时间为零),因为这些活动中的任何延迟都会延迟项目。将资源分配给项目时,请遵循以下规则:2162216
Fortunately, you do not have to try any of these combinations. Examine activity 2 in Figure 7-7. You can actually defer assigning resources to activity 2 until the day that activity 16 (which is on the critical path) must start, minus the estimated duration of activity 2. Activity 2 can “float” to the top (remain unassigned and not start) until it bumps against activity 16. All noncritical activities have float, which is the amount of time you could delay completing them without delaying the project. Critical activities have no float (or more precisely, their float is zero) since any delay in these activities would delay the project. When you assign resources to the project, follow this rule:
始终根据浮动时间分配资源。
Always assign resources based on float.
要弄清楚在活动完成后如何分配上例中的开发人员1,请计算活动完成后所有可能的活动的浮动时间1,并根据浮动时间从低到高分配四名开发人员。首先,为关键路径分配一名开发人员,并不是因为它特殊,而是因为它具有最低的浮动时间。现在,假设活动2有 60 天的浮动时间,活动4有 5 天的浮动时间。这意味着,如果您将活动推迟4超过 5 天,项目就会脱轨。相比之下,您2最多可以将活动推迟 60 天,因此您将下一位开发人员分配给活动4。在活动保持未分配的中间时间内2,您实际上是在消耗活动的浮动时间。也许当活动的浮动时间2变成 15 天时,您最终将能够为该活动分配一名开发人员。
To figure out how to assign developers in the previous example once activity 1 is complete, calculate the float of all activities that are possible once activity 1 is complete, and assign the four developers based on the float, from low to high. First, assign a developer to the critical path, not because it is special but because it has the lowest possible float. Now, suppose activity 2 has 60 days of float and activity 4 has 5 days of float. This means that if you defer getting to activity 4 by more than 5 days, you will derail the project. By contrast, you could defer getting to activity 2 by at most 60 days, so you assign the next developer to activity 4. During the intervening time while activity 2 remains unassigned, you are in effect consuming the activity’s float. Perhaps by the time the float of activity 2 has become 15 days, you will be finally able to assign a developer to this activity.
此过程的性质是迭代的,因为最初最低的人员配备水平是未知的,而且使用基于浮动时间的分配会改变活动的浮动时间。首先尝试为项目配备某种资源水平,例如六个资源,然后根据浮动时间分配这些资源。每次安排资源完成活动时,您都会扫描网络以查找最近的可用活动,选择浮动时间最低的活动作为该资源的下一个分配。如果您成功为项目配备人员,请再试一次,这次使用较低的人员配备水平,例如五个甚至四个资源。在某个时候,与可用资源相比,您的活动将过剩。如果这些未分配的活动具有足够高的浮动时间,您可以推迟为它们分配资源,直到某些资源可用。当这些活动未分配时,您将消耗它们的浮动时间。如果活动变得至关重要,那么您就无法使用该人员配备水平来构建项目,而必须满足于更高水平的资源。
The nature of this process is iterative both because initially the lowest level of staffing is unknown and because using float-based assignment changes the floats of the activities. Start by attempting to staff the project with some resource level, such as six resources, and then assign these resources based on float. Every time a resource is scheduled to finish an activity, you scan the network for the nearest available activities, choosing the activity with the lowest float as the next assignment for that resource. If you successfully staff the project, try again, this time with a reduced staffing level such as five or even four resources. At some point, you will have an excess of activities compared with the available resources. If those unassigned activities have high enough float, you could defer assigning resources to them until some resources become available. While these activities are unassigned, you will be consuming their float. If the activities become critical, then you cannot build the project with that staffing level, and you must settle for a higher level of resources.
基于浮动时间分配的另一个关键优势与降低风险有关。浮动时间最少的活动风险最大,最有可能延迟项目。首先将资源分配给这些活动可让您以最安全的方式为项目配备人员,并降低与任何给定人员配备水平相关的总体风险。同样,如果没有项目设计,项目经理或一组开发人员根据浮动时间分配活动的可能性几乎为零。以这种方式工作不仅缓慢且昂贵,而且风险很大。
Another key advantage of float-based assignment relates to risk reduction. Activities with the least float are the riskiest, the ones most capable of delaying the project. Assigning resources to these activities first allows you to staff a project in the safest possible way and reduce the overall risk associated with any given staffing level. Again, without project design, the likelihood that a project manager or a group of developers will assign activities based on float is nearly zero. Working this way is not just slow and expensive, but also risky.
到目前为止的讨论主要集中在活动之间的依赖关系作为构建网络的方式。然而,资源也会影响网络。例如,如果你将图 7-7所示的网络分配给单个开发人员,则实际的网络图将是一个长字符串,而不是图 7-7。对单个资源的依赖性会极大地改变网络图。因此,网络图实际上不仅仅是活动网络,而首先是依赖关系网络。如果你拥有无限的资源和非常灵活的人员配置,那么你只能依赖活动之间的依赖关系。一旦你开始消耗浮动时间,就必须将资源的依赖关系添加到网络中。这里的关键观察是:
The discussion so far has focused on the dependencies between the activities as the way to construct the network. However, the resources also affect the network. For example, if you were to assign the network depicted in Figure 7-7 to a single developer, the actual network diagram would be a long string, not Figure 7-7. The dependency on the single resource drastically changes the network diagram. Therefore, the network diagram is actually not just a network of activities, but first and foremost a network of dependencies. If you have unlimited resources and very elastic staffing, then you can rely only on the dependencies between the activities. Once you start consuming float, you must add the dependencies on the resources to the network. The key observation here is:
资源依赖就是依赖关系。
Resource dependencies are dependencies.
向项目网络分配资源的实际方式是多种变量的产物。分配资源时,必须考虑以下因素:
The actual way of assigning resources to the project network is a product of multiple variables. When you assign resources you must take the following into account:
规划假设
Planning assumptions
关键路径
Critical path
浮点数
Floats
可用资源
Available resources
约束
Constraints
即使对于简单的项目,这些也总是会导致多种项目设计方案。
These will always result in several project design options, even for straightforward projects.
结合使用项目网络、关键路径和浮动分析,您可以计算出项目的持续时间以及相对于项目开始的每个活动的开始时间。但是,网络中的信息基于工作日,而不是日历日期。您需要通过安排活动将网络中的信息转换为日历日期。您可以使用工具(如 Microsoft Project)轻松执行此任务。在工具中定义所有活动,然后添加依赖项作为前置任务,并根据您的计划分配资源。一旦为项目选择了开始日期,该工具就会安排所有活动。输出可能还包括甘特图,但这与您现在可以从该工具中收集的核心信息无关:项目中每个活动的计划开始和完成日期。
Together, the project network, the critical path, and the float analysis allow you to calculate the duration of the project as well as when each activity should start with respect to the project beginning. However, the information in the network is based on workdays, not on calendar dates. You need to convert the information in the network to calendar dates by scheduling the activities. This is a task that you can easily perform by using a tool (such as Microsoft Project). Define all activities in the tool, then add dependencies as predecessors, and assign the resources according to your plan. Once you select a start date for the project, the tool will schedule all activities. The output may also include a Gantt chart, but that is incidental to the core piece of information you can now glean from the tool: the planned start and completion dates for each activity in the project.
项目所需的人员配备不会随时间而变化。一开始,您只需要核心团队。一旦管理层选择了项目设计方案并批准了该项目,您就可以添加开发人员和测试人员等资源。
The required staffing for your project is not constant with time. At the beginning, you need only the core team. Once management selects a project design option and approves the project, you can add resources such as developers and testers.
由于依赖关系和关键路径,并非所有资源都需要一次性投入。同样,并非所有资源都会统一退役。核心团队始终需要投入,但开发人员不应该一直投入到项目的最后一天。理想情况下,您应该在项目开始时随着越来越多的活动成为可能而逐步引入开发人员,并在项目结束时逐步淘汰开发人员。
Not all resources are needed all at once due to the dependencies and the critical path. Much the same way, not all resources are retired uniformly. The core team is required throughout, but developers should not be needed through the last day of the project. Ideally, you should phase in developers at the beginning of the project as more and more activities become possible, and phase out the developers toward the end of the project.
这种逐步投入和逐步淘汰资源的方法有两个显著的优势。首先,它避免了许多软件项目所经历的“丰收或饥荒”周期。即使您拥有项目所需的平均人员水平,您也可能会在项目的某个部分人手不足,而在另一部分人手过剩。这种闲置或大量加班的周期令人沮丧,而且效率极低。其次(也是更重要的),逐步投入资源提供了实现规模经济的可能性。如果您的组织中有多个项目,那么您可以安排它们,让开发人员在逐步投入另一个项目的同时逐步退出一个项目。以这种方式工作可以使生产力提高百分之百,这就是经典的“用更少的资源做更多的事情”。
This approach of phasing in and phasing out resources has two significant advantages. First, it avoids the feast-or-famine cycles experienced by many software projects. Even if you have the required average level of staffing for the project, you could be understaffed in one part of the project and overstaffed in another part. These cycles of idleness or intense overtime are demoralizing and very inefficient. Second (and more importantly), phasing resources offers the possibility of realizing economy of scale. If you have several projects in the organization, then you could arrange them such that developers are always phasing out of one project while phasing into another. Working this way yields a hundreds of percent increase in productivity, the classic “doing much more with less.”
图 7-8描绘了一个设计良好、人员配备适当的项目的典型人员分配图。项目开始时是前端,在此期间核心团队正在进行系统和项目设计;此阶段结束时SDP 审查。如果项目在此时终止,人员配备将变为零,核心团队将可用于其他项目。如果项目获得批准,人员配备将初步增加,开发人员和其他资源将致力于项目中支持其他活动的最低级别活动。当这些活动可用时,项目可以吸收更多人员。在某个时候,您已经逐步投入了项目所需的所有资源,达到了人员配备的高峰。一段时间内,项目人员配备齐全。系统往往出现在此阶段的末尾。现在项目可以逐步淘汰资源,剩下的人员将致力于最依赖的活动。项目结束时,人员配备水平已达到系统测试和发布所需的水平。
Figure 7-8 depicts the typical staffing distribution chart of a well-designed and properly staffed project. At the start of the project is the front end, during which the core team is working on the system and project design; this phase ends with the SDP review. If the project is terminated at that point, the staffing goes to zero and the core team is available for other projects. If the project is approved, an initial ramp-up in staffing occurs in which developers and other resources are working on the lowest-level activities in the project that enable other activities. When those activities become available, the project can absorb additional staff. At some point you have phased in all the resources the project ever needs, reaching peak staffing. For a while, the project is fully staffed. The system tends to appear at the end of this phase. Now the project can phase out resources, and those left are working on the most dependent activities. The project concludes with the level of staffing required for system testing and release.
图7-8正确的人员分配
Figure 7-8 Correct staffing distribution
图 7-9展示了一个人员分配图,它演示了图 7-8的行为。要制作如图 7-9这样的图表,首先要为项目配备人员,然后按时间顺序列出所有感兴趣的日期(活动开始和结束的唯一日期)。然后,计算在感兴趣的日期之间的每个时间段内,每种资源类别需要多少资源。不要忘记在人员分配中包含那些没有特定活动但仍然需要的资源,比如核心团队、质量控制和编码活动之间的开发人员。这种堆积条形图在电子表格中很容易制作。本书附带的文件包含这些图表的几个示例项目和模板。
Figure 7-9 shows a staffing distribution chart that demonstrates the behavior of Figure 7-8. You produce a chart such as Figure 7-9 by first staffing the project, then listing all the dates of interest (unique dates when activities start and end) in chronological order. You then count how many resources are required for each category of resources in each time period between dates of interest. Do not forget to include in the staffing distribution resources that do not have specific activities but are nonetheless required, such as the core team, quality control, and developers between coding activities. This sort of stacking bar diagram is trivial to do in a spreadsheet. The files accompanying this book contain several example projects and templates for these charts.
图 7-9人员分布示例
Figure 7-9 Sample staffing distribution
由于感兴趣的日期可能不是均匀分布的,人员分布图中的条形图可能会在时间分辨率上有所不同。然而,在大多数规模适中的项目中,如果有足够多的活动,图表的整体形状应该遵循图7-8 .通过检查人员分配图,您可以快速获得有关项目设计质量的宝贵反馈。
Since the dates of interest may not be regularly spaced, the bars in the staffing distribution chart may vary in time resolution. However, in most decent-size projects with enough activities, the overall shape of the chart should follow that of Figure 7-8. By examining the staffing distribution chart, you get a quick and valuable feedback on the quality of your project design.
人员分配图中可能会出现几个常见的项目人员配置错误。如果图表呈矩形,则意味着人员配置是固定的——这种做法我之前已经警告过。
Several common project staffing mistakes may be evident in the staffing distribution chart. If the chart looks rectangular, it implies constant staffing—a practice against which I have already cautioned.
人员分布图中间出现巨大峰值(如图 7-10所示)也是一个危险信号:这样的峰值始终表示浪费。
A staffing distribution with a huge peak in the middle of the chart (as shown in Figure 7-10) is also a red flag: Such a peak always indicates waste.
图 7-10人员分布高峰
Figure 7-10 Peak in staffing distribution
考虑一下,当您只使用人员很短的一段时间时,在招聘人员和培训他们了解领域、架构和技术方面所花费的精力。峰值通常是由于项目没有消耗足够的浮动时间而导致的,从而导致资源需求激增。如果项目用一些浮动时间换取资源,曲线就会更平滑。图 7-11描绘了一个人员配备达到峰值的示例项目。
Consider the effort expended in hiring people and training them on the domain, architecture, and technology when you use them for only a short period of time. A peak is usually caused by not consuming enough float in the project, resulting in a spike in resource demand. If the project were to trade some float for resources, the curve would be smoother. Figure 7-11 depicts a sample project with a peak in staffing.
图 7-11人员分布样本峰值
Figure 7-11 Sample peak in staffing distribution
人员分配图中的一条平线(如图 7-12所示)是另一个典型的错误。平线表示图 7-8中的高点不存在。该项目可能不重要,缺少资源来为原计划中的非关键活动配备人员。
A flat line in the staffing distribution chart (as shown in Figure 7-12) is yet another classic mistake. The flat line indicates the absence of the high plateau of Figure 7-8. The project is likely subcritical and is missing the resources to staff the noncritical activities of the original plan.
图 7-12扁平亚临界人员配置分布
Figure 7-12 Flat subcritical staffing distribution
图 7-13显示了亚临界项目示例的人员配置分布。该项目在资源水平达到 11 或 12 时就进入亚临界状态。它不仅没有达到稳定水平,反而出现了低谷。
Figure 7-13 shows the staffing distribution for a sample subcritical project. This project goes subcritical at a level of 11 or 12 resources. It is not just missing the plateau, but has a valley instead.
图 7-13次关键人员配置分布示例
Figure 7-13 Sample subcritical staffing distribution
不稳定的人员配置分布(如图 7-14所示)是另一个危险信号。在设计时考虑到这种弹性的项目注定会令人失望(见图7-15),因为人员配置永远不可能那么有弹性。大多数项目无法凭空变出人员,让他们立即投入生产,然后过一会儿再将他们解雇。此外,当人们不断从一个项目来来去去时,培训(或再培训)他们的成本非常高。在这种情况下,很难让人们承担责任或保留他们的知识。
Erratic staffing distributions (as in Figure 7-14) are yet another distress signal. Projects that are designed with this kind of elasticity in mind are due for a disappointment (see Figure 7-15) because staffing can never be that elastic. Most projects cannot conjure people out of thin air, have them be instantly productive, and then dispose of them a moment later. In addition, when people constantly come and go from a project, training (or retraining) them is very expensive. It is difficult to hold people accountable or retain their knowledge under such circumstances.
图 7-14人员分布不均
Figure 7-14 Erratic staffing distribution
图片 7-15人员配置不规律示例
Figure 7-15 Sample erratic staffing distribution
图 7-16显示了另一种需要避免的人员配置分布,即项目开始时的高人手增加。虽然这个图没有包含任何数字,但图表清楚地表明了这种一厢情愿的想法。没有一个团队可以立即从零人员配置达到峰值,让每个人都创造价值并提供高质量、值得生产的代码。即使项目最初有那么多并行工作,即使你拥有资源,下游网络也会限制项目实际上可以吸收的资源数量,所需的人员配置就会逐渐减少。
Figure 7-16 illustrates another staffing distribution to avoid, the high ramp-up coming into the project. While this figure does not include any numbers, the chart clearly indicates wishful thinking. No team can instantly go from zero to peak staffing and have everyone add value and deliver high-quality, production-worthy code. Even if the project initially has that much parallel work, and even if you have the resources, the network downstream throttles how many resources the project can actually absorb beyond that, and the required staffing fizzles out.
图 7-16人员配置提升幅度较大
Figure 7-16 High ramp-up in staffing distribution
图 7-17展示了这样一个项目。该计划预计会立即达到 11 人,之后不久就会减少到 6 人左右,直到项目。任何团队都不可能以这种方式加快进度,而且由于团队规模过大,可用资源的利用效率很低。
Figure 7-17 demonstrates such a project. This plan expects instantaneously to get to 11 people, and shortly afterward deflates to around six people until the end of the project. It is improbable that any team can ramp up this way, and the available resources are used inefficiently due to the oversized team.
图片 7-17人员配置分布初始高坡度示例
Figure 7-17 Sample initial high ramp in staffing distribution
从图表中直观的错误指标可以看出,好的项目人员分配很顺畅。当你顺利完成项目时,生活会好得多,而不是急转弯或经历急加速和紧急刹车。
A key observation from the visual indicators of mistakes in the charts is that good projects have smooth staffing distributions. Life is much better when you are cruising along through your project rather than negotiating sharp turns or experiencing screaming acceleration and emergency braking.
正如刚才提到的,人员配置不当的两个根本原因是假设人员配置过于弹性以及在分配资源时不利用浮动时间。在考虑人员配置弹性时,您必须了解您的团队,并充分了解可用性和效率方面的可行性。人员配置弹性的程度还取决于组织的性质以及系统和项目设计的质量。设计越好,开发人员就能越快地适应新系统和活动。在大多数项目中,利用浮动时间很容易做到,而且可能会降低人员配置的波动性和所需人员的绝对水平。对人员配置弹性采取更现实的态度并利用浮动时间通常可以消除峰值、起伏和高涨幅。
As just mentioned, the two root causes of incorrect staffing are assuming too elastic staffing and not consuming float when assigning resources. When considering staffing elasticity, you have to know your team and have a good grasp on what is feasible as far as availability and efficiency. The degree of staffing elasticity also depends on the nature of the organization and the quality of the system and project design. The better the designs, the more quickly developers can come to terms with the new system and activities. Consuming float is easy to do in most projects and likely to reduce both the volatility in the staffing and the absolute level of the required staffing. Being more realistic about staffing elasticity and consuming float often eliminate the peaks, the ups-and-downs, and the high ramp-ups.
绘制每个项目设计方案的人员分配图是很好的验证辅助工具,有助于反思该方案并查看其是否合理。在项目设计中,如果感觉不对劲,通常就是出了问题。
Plotting the staffing distribution chart for each project design option is a great validation aid in reflecting on the option and seeing if it makes sense. With project design, if something does not feel right, more often than not, something is indeed wrong.
绘制人员分布图还有另一个明显的好处:它可以帮助您计算出项目成本。与实体建筑项目不同,软件项目没有商品或原材料成本。软件成本绝大部分是人工成本。这种人工成本包括所有团队成员,从核心团队到开发人员和测试人员。人工成本就是人员配备水平乘以时间:
Drawing the staffing distribution chart offers another distinct benefit: It is how you figure out the cost of the project. Unlike physical construction projects, software projects do not have a cost of goods or raw materials. The cost of software is overwhelmingly in labor. This labor includes all team members, from the core team to the developers and testers. Labor cost is simply the staffing level multiplied by time:
人员配置乘以时间,其实就是人员分布图下面的面积,要算成本,就需要算出那个面积。
Multiplying staffing by time is actually the area under the staffing distribution chart. To calculate the cost, you need to calculate that area.
人员分布图是项目的离散模型,在感兴趣的日期之间的每个时间段内都有垂直条(人员配备水平)。通过将每个垂直条的高度(人数)乘以其感兴趣的日期之间的时间段长度,可以计算出人员分布图下的面积(图 7-18)。然后将这些乘法的结果相加。
The staffing distribution chart is a discrete model of the project that has vertical bars (the staffing level) in each time period between dates of interest. You calculate the area under the staffing distribution chart by multiplying the height of each vertical bar (the number of people) by the duration of the time period between its dates of interest (Figure 7-18). You then sum the results of these multiplications.
图 7-18计算项目成本
Figure 7-18 Calculating project cost
人员配置图下方面积的计算公式为:
The formula for the calculation of the area under the staffing chart is:
在哪里:
where:
Si所关注日期的人员配备水平i。
Si is the staffing level at date of interest i.
Ti是感兴趣的日期i(是开始日期)。T0
Ti is the date of interest i (T0 is the start date).
n是项目中感兴趣的日期数。
n is the number of dates of interest in the project.
找到人员分配图下的面积是回答该项目需要花费多少钱的唯一方法。
Finding the area under the staffing distribution chart is the only way to answer the question how much the project will cost.
如果您使用电子表格来制作人员分布图,则只需添加另一列,其中包含运行总和以计算图表下的面积(本质上是数值积分)。本书附带的支持文件包含此计算的几个示例。
If you use a spreadsheet to produce the staffing distribution chart, you just need to add another column with a running sum to calculate the area under the chart (in essence, a numerical integration). The support files accompanying this book contain several examples of this calculation.
由于成本定义为人员配置乘以时间,因此成本单位应为工作量和时间,例如人月或人年。最好使用这些单位而不是货币来抵消工资、当地货币和预算的差异。这样就可以客观地比较不同项目设计方案的成本。
Since cost is defined as staffing multiplied by time, the units of cost should be effort and time, such as man-month or man-year. It is better to use these units as opposed to currency to neutralize differences in salary, local currencies and budgets. It then becomes possible to objectively compare the cost of different project design options.
考虑到架构、初始工作分解和工作量估算,回答构建系统需要多长时间和需要多少成本的问题最多只需要几个小时到一天的时间。遗憾的是,大多数软件项目都是盲目运行的。这就像玩扑克而不看牌一样明智——只不过,你赌的不是筹码,而是你的项目、你的职业前景,甚至是公司的未来。
Given the architecture, the initial work breakdown, and the effort estimation, it is a matter of a few hours to a day at the most to answer the questions of how long it will take and how much it will cost to build the system. Sadly, most software projects are running blind. This is as sensible as playing poker without ever looking at the cards—except, instead of chips, you have your project, your career prospects, or even the company’s future on the line.
一旦知道了项目成本,就可以计算项目效率。项目效率是所有活动的工作量总和(假设人员利用率完美)与实际项目成本之间的比率。例如,如果所有活动的工作量总和为 10 个人月(假设一个月有 30 个工作日),而项目成本为 50 个人月(正常工作日),则项目效率为 20%。
Once the project cost is known, you can calculate the project efficiency. The efficiency of a project is the ratio between the sum of effort across all activities (assuming perfect utilization of people) and the actual project cost. For example, if the sum of effort across all activities is 10 man-months (assuming 30 workdays in a month), and the project cost is 50 man-months (of regular workdays), then the project efficiency is 20%.
项目效率是项目设计质量和合理性的一个重要指标。一个设计良好的系统以及一个设计和人员配备得当的项目的预期效率在 15% 到 25% 之间。
The project efficiency is a great indicator of the quality and sanity of the project’s design. The expected efficiency of a well-designed system, along with a properly designed and staffed project, ranges between 15% and 25%.
这些效率率可能看起来低得惊人,但更高的效率实际上是项目计划不切实际的有力指标。自然界中没有任何过程能够达到 100% 的效率。没有一个项目是不受限制的,这些限制会阻止您以最有效的方式利用资源。当您添加核心团队、测试人员、构建和 DevOps 以及与项目相关的所有其他资源的成本时,用于编写代码的努力部分会大大减少。效率高达 40% 的项目根本无法构建。
These efficiency rates may seem appallingly low, but higher efficiency is actually a strong indicator of an unrealistic project plan. No process in nature can ever even approach 100% efficiency. No project is free from constraints, and these constraints prevent you from leveraging your resources in the most efficient way. By the time you add the cost of the core team, the testers, the Build and DevOps, and all the other resources associated with your project, the portion of the effort devoted to just writing code is greatly diminished. Projects with high efficiency such as 40% are simply impossible to build.
即使 25% 的效率也偏高,前提是拥有正确的系统架构,为项目提供最高效的团队(见图7-1),以及正确的项目设计,使用最少的资源并根据浮动时间进行分配。实现高效率预期所需的其他因素包括一支小型、经验丰富的团队,其成员习惯于一起工作,以及一位致力于质量并能处理项目复杂性的项目经理。
Even 25% efficiency is on the high side and is predicated on having a correct system architecture that will provide the project with the most efficient team (see Figure 7-1) and a correct project design that uses the smallest level of resources and assigns them based on floats. Additional factors required for delivering on high efficiency expectations include a small, experienced team whose members are accustomed to working together, and a project manager who is committed to quality and can handle the complexity of the project.
效率还与人员配置弹性相关。如果人员配置真正具有弹性(即,您总是可以在需要时获得资源,并在不再需要时精确地释放它们),那么效率就会很高。当然,人员配置永远不会那么有弹性,因此有时资源在仍分配给项目时会闲置,从而降低效率。在利用关键路径之外的资源时尤其如此。如果一个人负责所有关键活动,那么这个人实际上处于最高效率,因为这个人连续地处理活动,并且这项工作的成本接近关键活动成本的总和。对于非关键活动,总是存在浮动。由于人员配置永远没有真正的弹性,因此关键路径之外的资源永远无法得到非常高的利用效率。
Efficiency also correlates with staffing elasticity. If staffing were truly elastic (i.e., you could always get resources just when you need them and let them go at the precise moment when you no longer need them), the efficiency would be high. Of course, staffing is never that elastic, so sometimes resources will be idle while still assigned to the project, driving the efficiency down. This is especially the case when utilizing resources outside the critical path. If a single person is working on all critical activities, that person is actually at peak efficiency because the person works on activities back-to-back, and the cost of that effort approaches the sum of the cost of the critical activities. With noncritical activities, there is always float. Since staffing is never truly elastic, the resources outside the critical path can never be utilized at very high efficiency.
如果项目设计方案的预期效率很高,那么你必须调查根本原因。也许你假设的人员配置过于宽松和弹性,或者项目网络过于关键。毕竟,如果大多数网络路径都是关键的或接近关键的(大多数活动的浮动时间很短),那么你将获得较高的效率比。然而,这样的项目显然存在无法履行承诺的高风险。
If the project design option has a high expected efficiency, you must investigate the root cause. Perhaps you assumed too liberal and elastic staffing or the project network is too critical. After all, if most network paths are either critical or near-critical (most activities have low float), then you would get a high efficiency ratio. However, such a project is obviously at high risk of not meeting its commitments.
软件项目的效率与组织的性质密切相关。低效的组织不会在一夜之间变得高效,反之亦然。效率还与业务性质有关。为医疗设备开发软件的项目所需的开销与开发社交媒体插件的小型初创公司所需的开销不同。
The efficiency of software projects is tightly correlated with the nature of the organization. Inefficient organizations do not turn efficient overnight, and vice versa. Efficiency also relates to the nature of the business. The overhead required in a project that produces software for a medical device will differ from that of a small startup developing a social media plug-in.
您可以将效率用作另一种广泛的项目估算技术。假设您知道您的项目过去效率为 20%。一旦您有了各个活动的细分及其估算,只需将所有活动的工作量总和(假设利用率完美)乘以 5,即可得出粗略的总体项目成本。
You can use efficiency as yet another broad project estimation technique. Suppose you know that historically your projects were 20% efficient. Once you have your individual activity breakdown and their estimations, simply multiply the sum of effort (assuming perfect utilization) across all activities by 5 to produce a rough overall project cost.
另一种富有洞察力的项目设计技术是挣值规划。挣值是一种常用的项目跟踪方法,但您也可以将其用作出色的项目设计工具。通过挣值规划,您可以为项目完成的每项活动分配价值,然后将其与每项活动的时间表相结合,以了解您计划如何根据时间赚取价值。
Another insightful project design technique is earned value planning. Earned value is a popular means of tracking a project, but you can also use it as a great project design tool. With earned value planning you assign value to each activity toward the completion of project, and then combine it with the schedule of each activity to see how you plan to earn value as a function of time.
计划挣值的公式为:
The formula for the planned earned value is:
在哪里:
where:
Ei是活动 的预计持续时间i。
Ei is the estimated duration for activity i.
m是时刻 完成的活动数目t。
m is the number of activities completed at time t.
N是项目中的活动数。
N is the number of activities in the project.
t是一个时间点。
t is a point in time.
时间挣值是t按时间完成的所有活动的预计持续时间t总和除以所有活动的预计持续时间总和的比率。
The earned value at time t is the ratio between the sum of estimated duration of all activities completed by time t divided by the sum of the estimated durations of all activities.
例如,考虑表 7-1中非常简单的项目。
Consider, for example, the very simple project in Table 7-1.
表 7-1项目挣值示例
Table 7-1 Sample project earned value
活动 Activity |
持续时间(天) Duration (days) |
价值 (%) Value (%) |
|---|---|---|
前端 Front End |
40 40 |
20 20 |
接入服务 Access Service |
三十 30 |
15 15 |
用户界面 UI |
40 40 |
20 20 |
经理服务 Manager Service |
20 20 |
10 10 |
公共事业服务 Utility Service |
40 40 |
20 20 |
系统测试 System Testing |
三十 30 |
15 15 |
全部的 Total |
200 200 |
100 100 |
表 7-1中所有活动的预计工期总和为 200 天。例如,UI 活动预计为 40 天。由于 40 是 200 的 20%,因此您可以说,通过完成 UI 活动,您已经为项目完成赢得了 20% 的回报。从您的活动安排中,您还可以知道 UI 活动的预计完成时间,因此您实际上可以计算出您计划如何根据时间赚取价值(表 7-2)。
The sum of estimated duration across all activities in Table 7-1 is 200 days. The UI activity, for example, is estimated at 40 days. Since 40 is 20% of 200, you could state that by completing the UI activity, you have earned 20% toward the completion of the project. From your scheduling of activities you also know when the UI activity is scheduled to complete, so you can actually calculate how you plan to earn value as a function of time (Table 7-2).
表 7-2计划挣值随时间变化的示例
Table 7-2 Sample planned earned value as function of time
活动 Activity |
完成日期 Completion Date |
价值 (%) Value (%) |
挣值 (%) Earned Value (%) |
|---|---|---|---|
开始 Start |
0 0 |
0 0 |
0 0 |
前端 Front End |
1 t1 |
20 20 |
20 20 |
接入服务 Access Service |
2 t2 |
15 15 |
三十五 35 |
用户界面 UI |
吨3 t3 |
20 20 |
55 55 |
经理服务 Manager Service |
4 t4 |
10 10 |
65 65 |
公共事业服务 Utility Service |
5 t5 |
20 20 |
85 85 |
系统测试 System Testing |
吨6 t6 |
15 15 |
100 100 |
图 7-19显示了这样的计划进度图。当项目到达计划完成日期时,它应该已经赚到了 100% 的价值。图 7-19中的关键观察是,计划挣值曲线的坡度代表了团队的产出量。如果你将完全相同的项目分配给一个更好的团队,他们会更快地达到相同的 100% 挣值,因此他们的曲线会更陡峭。
Such a chart of planned progress is shown in Figure 7-19. By the time the project reaches the planned completion date, it should have earned 100% of the value. The key observation in Figure 7-19 is that the pitch of the planned earned value curve represents the throughput of the team. If you were to assign exactly the same project to a better team, they would meet the same 100% of earned value sooner, so their line would be steeper.
图片 7-19计划挣值图
Figure 7-19 Planned earned value chart
意识到可以从挣值图中衡量团队的预期产出量,可以快速识别项目计划中的错误。例如,请考虑图 7-20中的计划挣值图。世界上没有一个团队能够按照这样的计划完成任务。对于项目的大部分时间而言,预期产出量很肤浅。什么样的生产力奇迹才能在项目结束时让挣值火箭般飞速增长?
The realization that you can gauge the expected throughput of the team from the earned value chart enables you to quickly discern mistakes in your project plan. For example, consider the planned earned value chart in Figure 7-20. No team in the world could ever deliver on such a plan. For much of the project, the expected throughput was shallow. What kind of miracle of productivity would deliver the rocket launch of earned value toward the end of the project?
图 7-20不切实际的乐观计划
Figure 7-20 Unrealistically optimistic plan
这种不切实际、过于乐观的计划通常是倒推的结果。该计划甚至可能以最好的意图开始,沿着关键路径前进。不幸的是,你发现有人已经在某个特定日期承诺了该项目,而没有考虑项目设计或团队的实际能力。然后,你将剩余的活动塞进最后期限,基本上就是倒推。只有通过绘制计划的挣值,你才能引起人们对该计划不切实际的注意,并试图避免失败。图 7-21描绘了一个具有此类行为的项目。
Such unrealistic, overly optimistic plans are usually the result of back-scheduling. The plan may even start with the best of intentions, progressing along the critical path. Unfortunately, you find that someone has already committed the project on a specific date with no regard for a project design or the actual capabilities of the team. You then take the remaining activities and cram them against the deadline, basically back-scheduling from that. Only by plotting the planned earned value will you be able to call attention to the impracticality of this plan and try to avert failure. Figure 7-21 depicts a project with such behavior.
图 7-21不切实际的乐观计划示例
Figure 7-21 Sample unrealistically optimistic plan
类似地,你还可以发现不切实际的悲观计划,如图7-22所示。这个项目一开始很好,但随后生产力预计会突然下降——或者更可能的是,项目的时间比要求的要多得多。图 7-22中的项目将会失败,因为它允许镀金和复杂性抬头。你甚至可以从曲线的健康部分推断出项目应该何时完成(曲线拐点上方的某个位置)。
In much the same way, you can detect unrealistically pessimistic plans such as that shown in Figure 7-22. This project starts out well, but then productivity is expected to suddenly diminish—or more likely, the project was given much more time than was required. The project in Figure 7-22 will fail because it allows for gold plating and complexity to raise their heads. You can even extrapolate from the healthy part of the curve when the project should have finished (somewhere above the knee in the curve).
图 7-22不切实际的悲观计划
Figure 7-22 Unrealistically pessimistic plan
采用固定规模团队的项目在计划挣值图上总是会呈现一条直线。如前所述,您不应该让团队大小固定。一个配备适当人员和设计良好的项目总是会产生一条平坦的挣值图 S 曲线,如图 7-23所示。
A project utilizing a fixed-size team would always results in a straight line on the planned earned value chart. As mentioned already, you should not keep the team size fixed. A properly staffed and well-designed project always results in a shallow S curve for the earned value chart, as shown in Figure 7-23.
图 7-23平缓的 S 曲线
Figure 7-23 Shallow S curve
计划挣值曲线的形状与计划人员配置有关。在项目开始时,只有核心团队可用,因此前端没有太多可衡量的价值增加,挣值曲线的坡度几乎是平坦的。在 SDP 评审之后,项目可以开始增加人员。随着团队规模的扩大,您还将增加其产出量,因此挣值曲线会变得越来越陡峭。在某个时候,您会达到人员配置的峰值。在一段时间内,团队规模基本固定,因此在曲线中心的最大产出量处有一条直线。一旦您开始逐步淘汰资源,挣值曲线就会趋于平稳,直到项目完成。图 7-24显示了一条浅 S 曲线的示例。
The shape of the planned earned value curve is related to the planned staffing distribution. At the beginning of the project, only the core team is available, so not much measurable value is added at the front end, and the pitch of the earned value curve is almost flat. After the SDP review, the project can start adding people. As you increase the size of the team, you will also increase its throughput, so the earned value curve gets steeper and steeper. At some point you reach peak staffing. For a while the team size is mostly fixed, so there is a straight line at maximum throughput in the center of the curve. Once you start phasing out resources, the earned value curve levels off until the project completes. Figure 7-24 shows a sample shallow S curve.
图 7-24浅 S 曲线示例
Figure 7-24 Sample shallow S curve
挣值曲线是一种简单易行的方法来回答这个问题:“这个计划合理吗?”如果计划挣值是一条直线,或者它表现出图 7-20或图 7-22的问题,则该项目处于危险之中。如果它看起来像一个浅 S,那么至少你还有希望,这个计划是合理和合理的。
The earned value curve is a simple and easy way to answer the question: “Does the plan make sense?” If the planned earned value is a straight line, or it exhibits the issues of Figure 7-20 or Figure 7-22, the project is in danger. If it looks like a shallow S, then at least you have hope that the plan is sound and sensible.
架构师负责设计系统和构建系统的项目。架构师可能是团队中唯一对正确的架构、技术限制、活动之间的依赖关系、系统和项目的设计约束以及相关资源技能具有洞察力和远见的成员。指望管理层、项目经理、产品经理或开发人员来设计项目是徒劳的。他们所有人都缺乏设计项目所需的洞察力、信息和培训。此外,设计项目不是他们的工作。然而,架构师确实需要项目经理对资源成本、可用性场景、规划假设、优先级、可行性甚至所涉及的政治的投入、洞察力和观点,就像产品经理在制定架构方面至关重要一样。
It is up to the architect to design both the system and the project to build that system. The architect is likely the only member of the team with the insight and perspective on the correct architecture, the limits of the technology, the dependencies between the activities, the design constraints of both the system and the project, and the relative resource skills. It is futile to expect management, project managers, product managers, or developers to design the project. All of them simply lack the insight, information, and training required to design a project. Furthermore, designing the project is not part of their job. However, the architect does need the input, insight, and perspective of the project manager on the resources cost, the availability scenarios, planning assumptions, priorities, feasibility, and even the politics involved, just as the product manager is essential in producing the architecture.
建筑师将项目设计为遵循系统设计的连续设计工作。此过程与其他所有工程学科中使用的过程相同:项目设计是工程工作的一部分,绝不会留给施工工人和工头在现场或工厂车间去解决。
The architect designs the project as a continuous design effort following the system design. This process is identical to that used in every other engineering discipline: The design of the project is part of the engineering effort and is never left for the construction workers and foremen to figure out on-site or on the factory floor.
架构师不负责管理和跟踪项目。相反,项目经理会将实际开发人员分配到项目中,并跟踪他们相对于计划的进度。当执行过程中出现变化时,项目经理和架构师都需要一起完成闭环并重新设计项目。
The architect is not responsible for managing and tracking the project. Instead, the project manager assigns the actual developers to the project and tracks their progress against the plan. When things change during execution, both the project manager and the architect need to close the loop together and redesign the project.
意识到架构师应该设计项目是架构师角色成熟的一部分。对架构师的需求出现在 20 世纪 90 年代末,以应对软件系统的拥有成本和复杂性的增加。现在要求架构师设计能够实现可维护性、可重用性、可扩展性、可行性、可伸缩性、吞吐量、可用性、响应性、性能和安全性的系统。所有这些都是设计属性,解决它们的方法不是通过技术或关键字,而是通过正确的设计。
The realization that the architect should design the project is part of the maturity of the role of the architect. The demand for architects has emerged in the late 1990s in response to the increased cost of ownership and complexity of software systems. Architects are now required to design systems that enable maintainability, reusability, extensibility, feasibility, scalability, throughput, availability, responsiveness, performance, and security. All of these are design attributes, and the way to address them is not via technology or keywords, but with correct design.
但是,这份设计属性列表并不完整。本章从成功的定义开始,要想成功,你必须在该列表中添加进度、成本和风险。这些与其他属性一样都是设计属性,你可以通过设计项目来提供它们。
However, that list of design attributes is incomplete. This chapter started with the definition of success, and to succeed you must add to that list schedule, cost, and risk. These are design attributes as much as the others, and you provide them by designing the project.
项目网络是规划项目的逻辑表示。分析网络的技术称为关键路径法,尽管它与非关键活动的关系与关键活动的关系一样密切。关键路径分析非常适合复杂的项目,从物理构造到软件系统,并且它拥有数十年的成功记录。通过执行此分析,您可以找到项目持续时间并确定在何时何地分配资源。
The project network acts as a logical representation of the project for planning purposes. The technique for analyzing the network is called the critical path method, although it has as much to do with the noncritical activities as it does with the critical ones. Critical path analysis is admirably suited for complex projects, ranging from physical construction to software systems, and it has a decades-long proven track record of success. By performing this analysis, you find the project duration and determine where and when to assign your resources.
由于项目网络对项目设计至关重要,本章扩展了第 7 章项目设计概述中介绍的一些概念。您将看到反复出现的技术、术语和通用概念,它们独立于任何特定项目甚至任何行业。本章的思想使项目分析变得客观且可重复。任何两个架构师分析同一个项目网络都应该产生非常可比的结果。
Because the project network is so instrumental to project design, this chapter expands on a few of the concepts that were introduced in Chapter 7, in the project design overview. You will see recurring techniques, terms, and universal concepts that are independent of any specific project or even any industry. The ideas of this short chapter enable objective and repeatable analysis of the project. Any two architects analyzing the same project network should produce very comparable results.
软件项目中的活动是任何需要时间和资源的任务。活动可能包括架构、项目设计、服务构建、系统测试,甚至培训课程。项目是相关活动的集合,网络图捕获这些活动及其依赖关系。在网络图中,活动之间没有执行顺序或并发的概念。
An activity in a software project is any task that requires both time and a resource. Activities may include architecture, project design, service construction, system testing, and even training classes. The project is a collection of related activities, and the network diagram captures these activities and their dependencies. In network diagrams, there is no notion of order of execution or concurrency between the activities.
网络图通常故意不按比例显示,以便您可以专注于网络的依赖关系和一般拓扑。在大多数情况下,避免按比例显示还可以简化项目的设计。当估算发生变化、添加或删除活动或重新安排活动时,尝试保持网络图按比例显示会造成严重负担。
Network diagrams are often deliberately not shown to scale so that you can focus purely on dependencies and general topology of the network. Avoiding scale in most cases also simplifies the design of the project. Attempts to keep the network diagram to scale will impose a serious burden when estimations change, when you add or remove activities, or when you reschedule activities.
项目网络图有两种可能的表示形式:节点图和箭线图(图 8-1)。
There are two possible representations of a project network diagram: a node diagram and an arrow diagram (Figure 8-1).
图 8-1节点图(左)和等效箭头图(右)
Figure 8-1 A node diagram (left) and the equivalent arrow diagram (right)
在节点图中,图表中的每个节点代表一项活动。例如,在图 8-1的左侧,每个圆圈都是一项活动。节点图中的箭头表示活动之间的依赖关系,箭头的长度无关紧要。箭头上没有花费任何时间;相反,所有时间都花在节点内。除了增加节点的半径外,没有简单的方法可以按比例绘制节点图。这样做往往会使图表混乱,并且难以正确解释。
With a node diagram, each node in the chart represents an activity. For example, on the left side of Figure 8-1, every circle is an activity. The arrows in a node diagram represent dependencies between the activities, and the length of the arrow is irrelevant. There is no time spent along the arrows; instead, all time is spent inside the nodes. There is no simple way of drawing node diagrams to scale other than by increasing the radius of the nodes. Doing so tends to clutter the diagram and makes it difficult to interpret correctly.
在箭线图中,箭头表示活动,节点表示对进入活动的依赖关系,以及所有进入活动完成时发生的事件,如图 8-1右侧所示。请注意,图 8-1中的两个图描绘的是同一个网络,说明这两种图类型是等效的(即,任何网络都可以使用任一符号来呈现)。由于箭线图中的节点表示事件,因此节点内永远不会花费任何时间;也就是说,事件是瞬时的。与节点图一样,时间沿着箭头的方向流逝。要按比例绘制箭线图,需要将时间缩放到箭头的长度。也就是说,箭头的长度通常无关紧要(在本书中,除非明确说明,否则所有网络图均不按比例绘制)。
With an arrow diagram, arrows represent activities, and the nodes represent dependencies on the entering activities, as well as events that occur when all the entering activities complete, as shown on the right side of Figure 8-1. Note that both diagrams in Figure 8-1 depict the same network, illustrating that the two diagram types are equivalent (i.e., any network can be rendered using either notation). Since the nodes in an arrow diagram represent events, no time is ever spent inside a node; that is, the events are instantaneous. As with node diagrams, time passes in the direction of the arrows. To draw an arrow diagram to scale, you would scale time to the length of the arrow. That stated, typically the arrow’s length is irrelevant (in this book, unless explicitly stated, all network diagrams are not to scale).
对于箭头图,所有活动都必须有一个开始事件和完成事件。为整个项目添加一个总体开始和完成事件也是很好的做法。
With an arrow diagram, all activities must have a start event and a completion event. It is also good practice to add an overall start and completion event for the project as a whole.
假设在图 8-1的网络中,活动4还依赖于活动1。如果活动2已经依赖于活动1,则箭头图存在问题,因为您无法拆分活动箭头1。解决方案是在活动的完成事件1和开始事件之间引入一个虚拟活动(如图 8-24中的虚线箭头所示)。虚拟活动是持续时间为零的活动,其唯一目的是表达对其尾节点的依赖关系。
Suppose in the network of Figure 8-1, activity 4 also depends on activity 1. If activity 2 already depends on activity 1, the arrow diagram has a problem, because you cannot split the arrow of activity 1. The solution is to introduce a dummy activity between the completion event of activity 1 and the start event of 4 (shown as a dashed arrow in Figure 8-2). The dummy activity is an activity of zero duration whose sole purpose is to express the dependency on its tail node.
图 8-2使用虚拟活动
Figure 8-2 Use of a dummy activity
虽然这两种符号是等效的,但每种符号都有各自的优缺点。箭头图的一大优势是完成事件是指定里程碑的自然位置。里程碑是指示项目重要部分完成的事件。使用节点图时,您通常必须添加持续时间为零的活动作为里程碑。
While the two notations are equivalent, there are distinct pros and cons for each. One point favoring arrow diagrams is that the completion events are a natural place to designate milestones. A milestone is an event denoting the completion of a significant part of the project. With node diagrams you typically have to add activities of zero duration as milestones.
几乎每个人都需要一些练习才能正确绘制和阅读箭头图,而人们直观地绘制节点图并理解它们,这让节点图看起来具有明显的优势。节点图乍一看似乎不需要虚拟活动,因为您可以添加另一个依赖箭头(例如活动之间1和图 8-14左侧的另一个箭头)。出于这些简单的原因,绝大多数绘制网络图的工具都使用节点图。
Nearly everyone requires a bit of practice to correctly draw and read an arrow diagram, whereas people intuitively draw node diagrams and understand them, giving node diagrams what appears to be a clear advantage. Node diagrams at first seem to have no need for a dummy activity because you can just add another dependency arrow (such as another arrow between activities 1 and 4 on the left side of Figure 8-1). For these simplistic reasons, the vast majority of tools for drawing network diagrams use node diagrams.
相比之下,IDesign 至少有四位客户开发了箭头图绘制工具(其中两种工具随本书的支持文件一起提供)。他们投资箭头图工具是因为所有节点图都存在一个关键缺陷。请考虑图 8-3中的网络。
In contrast, at least four of IDesign’s customers have developed arrow diagramming tools (and two of these tools are available along with the support files for this book). They invested in the tools for arrow diagrams due to a crucial flaw in all node diagrams. Consider the networks in Figure 8-3.
图 8-3节点图与箭头图中的重复依赖关系 [摘自 James M. Antill 和 Ronald W. Woodhead 所著《建筑实践中的关键路径》第 4 版(Wiley,1990 年),并做了相应修改。]
Figure 8-3 Repeated dependencies in node versus arrow diagrams [Adopted and modified from James M. Antill and Ronald W. Woodhead, Critical Path in Construction Practice, 4th ed. (Wiley, 1990).]
图 8-3描绘了两个相同的网络,都包含六项活动;1、2、3、4、5。6活动4、5和6都依赖于活动1、2、 和3。使用箭头图,网络直观易懂,而相应的节点图则是纠缠不清的猫形摇篮。您可以通过引入持续时间为零的虚拟节点来清理节点图,但这可能会与里程碑混淆。
Figure 8-3 depicts two identical networks, both comprising six activities; 1, 2, 3, 4, 5, 6. Activities 4, 5, and 6 all depend on activities 1, 2, and 3. With an arrow diagram the network is straightforward and easy to understand, while the corresponding node diagram is an entangled cat’s cradle. You can clean up the node diagram by introducing a dummy node of zero duration, but that may get confused with a milestone.
事实证明,图 8-3中的情况在设计良好的软件系统中非常常见,在这些系统中,您会在架构的各个层之间看到重复的依赖关系。例如,活动1、2和3可能是ResourceAccess服务,而活动4、5和6可能是一些管理器和引擎,每个管理器和引擎都使用所有三个ResourceAccess服务。使用节点图,即使在图 8-3所示的简单项目网络中也很难弄清楚发生了什么。当您添加资源、客户端和实用程序时,该图变得完全难以理解。
As it turns out, the situation in Figure 8-3 is very common in well-designed software systems in which you have repeated dependencies across the layers of the architecture. For example, activities 1, 2, and 3 could be ResourceAccess services, and activities 4, 5 and 6 could be some Managers and Engines, each using all three ResourceAccess services. With node diagrams, it is difficult to figure out what is going on even in a simple project network like that shown in Figure 8-3. By the time you add Resources, Clients, and Utilities, the diagram becomes utterly incomprehensible.
绘制难以理解的网络图毫无意义。网络图的主要目的是沟通:您努力将项目设计传达给其他人,甚至传达给自己。如果模型没有人能理解,也没有人能与之产生共鸣,那么网络图的初衷就落空了。
It is pointless to draw unintelligible network diagrams. The primary purpose of the network diagram is communication: You strive to communicate your project design to others or even to yourself. Having a model that no one can understand and to which no one can relate defeats the purpose of having the network diagram in the first place.
因此,您应避免使用节点图,而应使用箭头图。拥有简洁、清晰、无杂乱的项目模型所带来的好处足以抵消最初的箭头图学习曲线。缺乏广泛可用的箭头图工具支持(迫使您手动绘制箭头图)并不一定是坏事。手工绘制网络很有价值,因为在此过程中您可以查看和验证活动依赖关系,甚至可能揭示有关项目的其他见解。
Consequently, you should avoid node diagrams and use arrow diagrams. The initial arrow diagram learning curve is more than offset by the benefits of having a concise, clear, clutter-free model of your project. The lack of widely available tool support for arrow diagrams, which forces you to draw your arrow diagram manually, is not necessarily a bad thing. Drawing the network by hand is valuable because in the process you review and verify activity dependencies, and it may even unveil additional insights about the project.
关键路径上的活动必须尽快按计划完成,以免延误项目。非关键事件可以延迟,而不会影响进度;换句话说,它们可以浮动到必须开始为止。一个没有任何浮动时间的项目,其中所有网络路径都是关键的,理论上可以履行其承诺,但实际上任何地方的任何失误都会导致延误。从设计角度来看,浮动时间是项目的安全边际。在设计项目时,您总是希望在网络中保留足够的浮动时间。然后,开发团队可以使用这些浮动时间以补偿非关键活动中的不可预见的延迟。低浮动时间的项目面临很高的延误风险。低浮动时间活动的任何超过轻微的延迟都会导致该活动变得关键并使项目脱轨。
Activities on the critical path must complete as soon as planned to avoid delaying the project. Noncritical events may be delayed without slipping the schedule; in other words, they can float until they must begin. A project without any float, where all network paths are critical, could in theory meet its commitments, but in practice any misstep anywhere will cause a delay. From a design perspective, floats are the project’s safety margins. When designing a project, you always want to reserve enough float in the network. The development team can then consume this float to compensate for unforeseen delays in noncritical activities. Low-float projects are at high risk of delays. Anything more than a minor delay on a low-float activity will cause that activity to become critical and derail the project.
到目前为止,对浮动时间的讨论有些简单,因为浮动时间实际上有几种类型。本章讨论两种类型:总浮动时间和自由浮动时间。
The discussion of floats so far was somewhat simplistic because there are actually several types of floats. This chapter discusses two types: total float and free float.
活动的总浮动时间是指在不延迟整个项目的情况下,您可以延迟该活动完成的时间。当某项活动的完成延迟的时间少于其总浮动时间时,其下游活动也可能会延迟,但项目的完成不会延迟。这意味着总浮动时间是活动链的一个方面,而不仅仅是特定活动。请考虑图 8-4顶部的网络,其中用粗线显示了关键路径,其上方是非关键路径或活动链。
An activity’s total float is by how much time you can delay the completion of that activity without delaying the project as a whole. When the completion of an activity is delayed by an amount less than its total float, its downstream activities may be delayed as well, but the completion of the project is not delayed. This means that total float is an aspect of a chain of activities, not just particular activities. Consider the network in the top part of Figure 8-4, which shows the critical path in bold lines and a noncritical path or chain of activities above it.
图 8-4浮动作为活动链的一部分
Figure 8-4 Float as an aspect of a chain of activities
为了便于讨论,图 8-4按比例绘制,以便每条线的长度与每项活动的持续时间相对应。非关键活动的总浮动量都相同,由活动箭头末端的红线表示。想象一下,图中上半部分的第一个非关键活动的开始时间被延迟,或者活动所花的时间比预估的要长。当该活动执行时,完成上游活动的延迟会消耗下游活动的总浮动量(如图下半部分所示)。
For the purpose of this discussion, Figure 8-4 is drawn to scale so that the length of each line corresponds to each activity’s duration. The noncritical activities all have the same amount of total float, indicated by the red line at the end of the activity’s arrow. Imagine that the start of the first noncritical activity in the top half of the figure is delayed or that the activity takes longer than its estimation. While that activity executes, the delay in completing the upstream activity consumes the total float of the downstream activities (shown in the bottom half of the figure).
所有非关键活动都有一定的总浮动时间,同一非关键链上的所有活动都共享部分总浮动时间。如果活动也安排尽快开始,那么同一链上的所有活动将具有相同的总浮动时间。在链上更靠前的某个地方消耗总浮动时间会将其从下游活动中耗尽,从而使这些活动变得更加关键和更具风险。
All noncritical activities have some total float, and all activities on the same noncritical chain share some of the total float. If the activities are also scheduled to start as soon as possible, then all activities on the same chain will have the same amount of total float. Consuming the total float somewhere further up a chain will drain it from the downstream activities, making them more critical and riskier.
一项活动的自由浮动时间是指在不影响项目中其他活动的情况下,您可以延迟该活动完成的时间。当一项活动的完成延迟的时间小于或等于其自由浮动时间时,下游活动不会受到任何影响,当然整个项目也不会延迟。请看图 8-5。
An activity’s free float is by how much time you can delay the completion of that activity without disturbing any other activity in the project. When the completion of an activity is delayed by an amount less or equal to its free float, the downstream activities are not affected at all, and of course the project as a whole is not delayed. Consider Figure 8-5.
图 8-5消耗自由浮动时间
Figure 8-5 Consuming free float
再次,为了便于讨论,图 8-5是按比例绘制的。假设图上半部分非关键链中的第一个活动有一些自由浮动时间,由活动箭头末端的红色虚线表示。假设该活动的延迟时间小于(或等于)其自由浮动时间。您可以看到下游活动不知道该延迟(图的下半部分)。
Again, for the purpose of this discussion, Figure 8-5 is drawn to scale. Suppose the first activity in the noncritical chain in the top part of the figure has some free float, indicated by the dotted red line at the end of the activity’s arrow. Imagine that the activity is delayed by an amount of time less than (or equal to) its free float. You can see that the downstream activities are unaware of that delay (bottom part of diagram).
有趣的是,虽然任何非关键活动总是有一定的总浮动时间,但活动可能有也可能没有自由浮动时间。如果你安排非关键活动尽快开始,那么即使这些活动是非关键的,它们的自由浮动时间也是零,因为任何延迟都会扰乱其他非关键活动但是,连接回关键路径的非关键链上的最后一个活动总是有一些自由浮动时间(否则它也可能是一个关键活动)。
Interestingly, while any noncritical activity always has some total float, an activity may or may not have free float. If you schedule your noncritical activities to start as soon as possible, back to back, then even though these activities are noncritical, their free float is zero because any delay will disrupt the other noncritical activities on the chain. However, the last activity on a noncritical chain that connects back to the critical path always has some free float (or it would be a critical activity, too).
自由浮动在项目设计期间用处不大,但在项目执行期间却非常有用。当某项活动被延迟或超出其工作量估算时,延迟活动的自由浮动可让项目经理了解在项目中的其他活动受到影响(如果有的话)之前有多少时间可用。如果延迟小于延迟活动的自由浮动,则实际上无需执行任何操作。如果延迟大于自由浮动(但小于总浮动),项目经理可以从延迟中减去自由浮动,并准确衡量延迟对下游活动的干扰程度并采取适当的措施。
Free float has little use during project design, but it can prove very useful during project execution. When an activity is delayed or exceeds its effort estimation, the free float of the delayed activity enables the project manager to know how much time is available before other activities in the project will be affected, if at all. If the delay is less than the delayed activity’s free float, nothing really needs to be done. If the delay is greater than the free float (but less than the total float), the project manager can subtract the free float from the delay and accurately gauge the degree by which the delay will interfere with downstream activities and take appropriate actions.
项目网络中的浮动时间取决于活动持续时间、活动依赖关系以及您可能引入的任何延迟。这些都与您安排这些活动时的实际日历日期无关。即使项目的实际开始日期尚未确定,您也可以计算浮动时间。
The floats in the project network are a function of the activity durations, their dependencies, and any delays you may introduce. None of these have to do with actual calendar dates when you schedule these activities. You can calculate the floats even if the actual start date of the project is as yet undecided.
在大多数规模适中的网络中,如果手动进行此类浮动计算,很容易出错,很快就会失控,并且会因网络的任何变化而失效。好消息是,这些都是纯机械计算,您应该使用工具来计算浮动。1有了总浮动值,您就可以将它们记录在项目网络中,如图 8-6所示。该图显示了一个示例项目网络,其中黑色数字是每个活动的 ID,箭头下方的蓝色数字是非关键活动的总浮动值。
In most decent-size networks, such float calculations, if done manually, are error prone, get quickly out of hand, and are invalidated by any change to the network. The good news is that these are purely mechanical calculations, and you should use tools for calculating the floats.1 With the total float values at hand, you can record them on the project network as shown in Figure 8-6. This figure shows a sample project network in which the numbers in black are each activity’s ID, and the numbers in blue below the arrows are the total float for the noncritical activities.
图 8-6记录网络上的浮动总量
Figure 8-6 Recording total floats on the network
1.您可以使用 Microsoft Project 计算每个活动的浮动时间,方法是插入Total Slack和Free Slack列,它们对应于总浮动时间和自由浮动时间。要了解如何手动计算浮动时间,请参阅 James M. Antill 和 Ronald W. Woodhead 的《Critical Path in Construction Practice》(第 4 版)(Wiley,1990 年)。
1. You can use Microsoft Project to calculate the floats for each activity by inserting the Total Slack and the Free Slack columns, which correspond to the total and free float. To learn how to manually calculate floats, see James M. Antill and Ronald W. Woodhead, Critical Path in Construction Practice, 4th ed. (Wiley, 1990).
虽然项目设计只需要总时差,但您也可以在网络图中记录自由时差。项目经理会发现这些信息在项目执行过程中非常有价值。
While only total float is required for project design, you can also record free float in the network diagram. The project manager will find this information invaluable during project execution.
如图 8-6所示,在网络图上捕获有关浮点数的信息并不理想。人类处理字母数字数据的速度很慢,很难与此类数据联系起来。很难检查复杂网络(甚至是简单网络,如图8-6所示)并一目了然地评估网络的关键性。网络的关键性表明风险区域在哪里以及项目与全关键网络的距离。通过对箭头和节点进行颜色编码,可以更好地直观地显示总浮点数 — — 例如,使用红色表示低浮点数值,黄色表示中等浮点数值,绿色表示高浮点数值。您可以通过多种方式划分三个浮点数范围:
Capturing the information about the floats on the network diagram, as shown in Figure 8-6, is not ideal. Human beings process alphanumeric data slowly and have a hard time relating to such data. It is difficult to examine complex networks (or even simple ones, like that in Figure 8-6) and at a glance to assess the criticality of the network. The criticality of the network indicates where the risky areas are and how close the project is to an all-critical network. Total floats are better visualized by color-coding the arrows and nodes—for example, using red for low float values, yellow for medium float values, and green for high float values. You can partition the three float value ranges in several ways:
相对关键性。相对关键性将网络中所有活动的浮动时间的最大值分为三个相等的部分。例如,如果最大浮动时间为 45 天,则红色表示浮动时间为 1 至 15 天,黄色表示浮动时间为 16 至 30 天,绿色表示浮动时间为 31 至 45 天。如果最大浮动时间是一个大数字(例如大于 30 天),并且浮动时间在此数字范围内均匀分布,则此技术效果很好。
Relative criticality. Relative criticality divides the maximum value of the float of all activities in the network into three equal parts. For example, if the maximum float is 45 days, then red would be 1 to 15 days, yellow would be 16 to 30 days, and green would be 31 to 45 days of float. This technique works well if the maximum float is a large number (such as greater than 30 days) and the floats are uniformly distributed up to that number.
指数临界性。相对临界性假设延迟风险在浮动范围内大致均匀分布。实际上,浮动时间为 5 天的活动比浮动时间为 10 天的活动更有可能使项目脱轨,尽管两者都可能被相对临界性归类为红色。为了解决这个问题,指数临界性将最大浮动范围划分为三个不相等、指数较小的范围。我建议在范围的 1/9 和 1/3 处进行划分:这些划分的大小合理,但比 1/4 和 1/2 产生的划分更为激进,除数与颜色数量成正比。例如,如果最大浮动时间为 45 天,则红色为 1 至 5 天,黄色为 6 至 15 天,绿色为 16 至 45 天的浮动。与相对临界性一样,如果最大总浮时量是一个很大的数字(比如大于 30 天),并且浮时量在该数字之前均匀分布,则指数临界性会发挥很好的作用。
Exponential criticality. Relative criticality assumes that the risk for delay is somewhat equally spread across the range of float. In reality, an activity with 5 days of float is much more likely to derail the project than an activity with 10 days of float, even though both may be classified as red by relative criticality. To address this issue, the exponential criticality divides the range of the maximum float into three unequal, exponentially smaller ranges. I recommend making the divisions at 1/9 and 1/3 of the range: These divisions are reasonably sized but more aggressive than those produced by 1/4 and 1/2, and the divisors are proportional to the number of colors. For example, if the maximum float is 45 days, then red would be 1 to 5 days, yellow would be 6 to 15 days, and green would be 16 to 45 days of float. As with relative criticality, the exponential criticality works well if the maximum total float is a large number (such as greater than 30 days) and the floats are uniformly distributed up to that number.
绝对关键性。绝对关键性分类与最大浮动值以及浮动在范围内的分布均匀性无关。绝对关键性为每个颜色分类设置了一个绝对浮动范围。例如,红色活动是浮动时间为 1 至 9 天的活动,黄色活动是浮动时间为 10 至 26 天的活动,绿色活动是浮动时间为 27 天(或更多)。绝对关键性分类简单明了,在大多数项目中效果很好。缺点是可能需要根据手头的项目定制范围以反映风险。例如,在为期 2 个月的项目中,10 天可能是绿色,但在为期一年的项目中则是红色。
Absolute criticality. The absolute criticality classification is independent of both the value of the maximum float and how uniformly the floats are distributed along the range. The absolute criticality sets an absolute float range for each color classification. For example, red activities would be those with 1 to 9 days of float, yellow would be 10 to 26 days of float, and green activities would be 27 days of float (or more). Absolute criticality classification is straightforward and works well in most projects. The downside is that it may require customizing the ranges to the project at hand to reflect the risk. For example, 10 days may be green in a 2-month project but red in a year-long project.
图 8-7显示了与图 8-6相同的网络,使用刚刚建议的绝对浮动范围对绝对关键性进行颜色编码分类。黑色表示的关键活动没有浮动。
Figure 8-7 shows the same network as in Figure 8-6 with color coding for absolute criticality classification using the absolute float ranges just suggested. The critical activities in black have no float.
图 8-7浮点数颜色编码
Figure 8-7 Floats color coding
将图 8-7中的视觉信息与图 8-6中的相同文本信息进行比较,你就能立即发现该项目的第二部分是有风险的。
Compare the ease with which you can interpret the visual information in Figure 8-7 versus the same textual information in Figure 8-6. You can immediately see that the second part of the project is risky.
如第 7 章所述,将资源分配给活动的最安全、最有效的方法是基于浮动时间(或者根据本章的定义,总浮动时间)。这是最安全的方法,因为您首先处理风险较高的活动,也是最有效的,因为您可以最大限度地提高资源利用时间的百分比。
As stated in Chapter 7, the safest and most efficient way to assign resources to activities is based on float—or, given the definition of this chapter, total float. This is the safest method because you address the riskier activities first, and it is the most efficient because you maximize the percentage of time for which the resources are utilized.
请考虑图 8-8所示的进度表快照。这里,每个彩色条的长度代表该活动的时间尺度持续时间,右侧或左侧位置与进度表对齐。
Consider the snapshot of a scheduling chart shown in Figure 8-8. Here the length of each colored bar represents the time-scaled duration of that activity, and the right or left position is aligned with the schedule.
图 8-8不消耗浮动时最大化资源需求
Figure 8-8 Maximizing resource demand when not consuming float
图中共有四项活动:1、、、。所有活动都准备好在同一天开始。由于下游活动(未显示),活动是关键的,而活动、、和具有不同程度的总浮动时间,这通过颜色编码表示:2为红色(低浮动时间)、为黄色(中等浮动时间)和为绿色(高浮动时间)。假设这些都是所有开发人员都能同样出色地完成的开发活动,并且不存在任务连续性问题。在为该项目配备人员时,首先必须为关键活动指派一名开发人员。如果有第二名开发人员,则应将该开发人员指派到活动,该活动在所有其他活动中具有最低的浮动时间。这样,您可以使用多达四名开发人员,尽快开展每一项活动。34123423412
The figure has four activities: 1, 2, 3, 4. All activities are ready to start on the same date. Due to downstream activities (not shown), activity 1 is critical, while activities 2, 3, and 4 have various levels of total float indicated by their color coding: 2 is red (low float), 3 is yellow (medium float), and 4 is green (high float). Suppose these are all development activities that all developers can perform equally well, and that there are no task continuity issues. When staffing this project, you first must assign a developer to the critical activity 1. If you have a second developer, you should assign that developer to activity 2, which has the lowest float of all other activities. This way, you can utilize as many as four developers, working on each of the activities as soon as possible.
或者,你可以只为项目配备两名开发人员(见图8-9)。和前面一样,第一位开发人员负责活动。第二位开发人员尽快1开始活动,因为推迟近乎关键的活动并使其成为关键活动是没有意义的。2
Alternatively, you can staff the project with only two developers (see Figure 8-9). As before, the first developer works on activity 1. The second developer starts with activity 2 as soon as possible because there is no point in postponing a near-critical activity and making it critical.
图 8-9资源交易浮动
Figure 8-9 Trading float for resources
一旦活动2完成,第二位开发人员将转到具有最低浮动时间的剩余活动,即活动3。这需要3在时间线上进一步重新安排,直到第二位开发人员在完成活动后有空2。这只能通过消耗(减少)活动的浮动时间来实现3;由于一些在此示例中,下游依赖项会导致活动3变为红色。活动3完成后,第二个开发人员继续处理活动4。这也只能通过消耗活动的可用浮动时间来实现4;虽然在此示例中是可以接受的,但这会使活动的浮动时间4从绿色变为黄色。
Once activity 2 is complete, the second developer moves to the remaining activity with the lowest float, activity 3. This requires rescheduling 3 further down the timeline, until the second developer is available after completing activity 2. This is only possible by consuming (reducing) the float of activity 3; due to some downstream dependencies in this example, this causes activity 3 to become red. Once activity 3 is complete, the second developer proceeds to work on activity 4. This, too, is possible only by consuming the available float of activity 4; while acceptable in this example, this turns the float of activity 4 from green to yellow.
这种人员配置形式用浮动时间换取资源,实际上就是换取成本。分配资源时,浮动时间的使用方式有两种:根据浮动时间,从低到高将资源分配给可用活动;如果需要,可以使用活动的浮动时间,以较少的资源为项目配备人员,而不会延误项目。
This form of staffing trades float for resources and, in effect, for cost. When assigning resources, you use floats in two ways: You assign resources to available activities based on floats, low to high, and if needed, you consume activities’ float to staff the projects with a smaller level of resources without delaying the project.
如上所述,根据浮动时间分配资源允许您用浮动时间换取成本。您可能想用项目的所有浮动时间换取更低的成本,但这很少是一个好主意,因为浮动时间很少的项目对延迟的容忍度较低。当您用资源换取浮动时间时,您降低了成本但增加了风险。实际上,您不仅仅是用浮动时间换取更低的成本,而且用更低的成本换取更高的风险。因此,浮动时间交易是三方交易。在图 8-9的示例中,使用两个开发人员而不是四个开发人员降低了成本,但副作用是增加了项目风险。在项目设计期间,您应该不断管理剩余的浮动时间,并通过这样做来管理项目的风险。这允许您制定提供不同进度、成本和风险组合的多个选项。
As just stated, assigning resources based on float allows you to trade float for cost. You may be tempted to trade all the project’s float for lower cost, but that is rarely a good idea because a project with little float has less tolerance for delays. When you trade resources for float, you reduce the cost but increase the risk. In effect, you are not merely trading float for lower cost, but lower cost for higher risk. The float trade therefore is a three-way trade. In the example of Figure 8-9, using two developers rather than four has reduced cost but, as a side effect, made the project risker. During project design you should constantly manage the remaining float, and by doing so manage the risk of the project. This allows you to craft several options that offer different blends of schedule, cost, and risk.
交付任何系统的最快方式是沿着关键路径构建系统。设计良好的项目也会沿着关键路径高效地分配所需的最少资源,但项目的持续时间仍受关键路径的限制。您可以通过采用有助于快速、干净地进行开发的软件工程实践来加速执行。除了这些开发最佳实践之外,本章还讨论了如何通过压缩关键路径来缩短进度。这种进度缩短的主要方法是重新设计项目,生成几个更短、更压缩的项目设计解决方案。然后,您将看到时间成本曲线的基本概念以及时间和成本在项目中如何相互作用。结果是一组项目设计选项,使您既能够最好地预先满足管理层对时间和成本的期望,又能够在情况发生变化时快速调整。
The fastest way to deliver any system is to build it along its critical path. A well-designed project also efficiently assigns the minimally required resources along its critical path, but the project’s duration is still bounded by the critical path. You can accelerate the execution by adopting software engineering practices that facilitate quick and clean development. Beyond these development best practices, this chapter discusses what you can do to reduce the schedule by compressing the critical path. The primary technique for this type of schedule reduction is redesigning the project by producing several shorter, ever-more-compressed project design solutions. You will then see the fundamental concept of the time–cost curve and how time and cost interplay in a project. The result is a set of project design options that enable you both to best fit up front the wishes of management for time and cost and to pivot quickly in case things change.
与许多人的想法相反,按时完成任务的最佳方式几乎从来不是更加努力地工作或让更多人参与项目。它需要更聪明、更干净、更正确地工作,同时采用一系列最佳实践。一般来说,以下技术在任何软件项目中都是可行的,并将加速整个项目的进度:
Contrary to what many believe, the best way of meeting a deadline hardly ever involves just working harder or throwing more people on the project. It does involve working smarter, cleanly, and correctly while embracing a wide set of best practices. In general, the following techniques are possible in any software project and will accelerate the project as a whole:
保证质量。大多数团队错误地将他们的质量控制和测试活动称为质量保证 (QA)。真正的 QA 与测试关系不大。它通常涉及一位高级专家来回答以下问题:如何保证质量?答案必须包括如何引导整个开发过程以确保质量,如何防止问题发生,以及如何跟踪问题的根本原因并修复它们。QA 人员的存在是组织成熟度的标志,几乎总是表明对质量的承诺,理解质量不会自动发生,并承认组织必须积极追求质量。QA 人员有时负责设计流程和为关键阶段编写程序。由于质量决定生产力,因此适当的 QA 总是能加快进度,并且它使实施 QA 的组织有别于行业中的其他组织。
Assure quality. Most teams incorrectly refer to their quality control and testing activities as quality assurance (QA). True QA has little to do with testing. It typically involves a single, senior expert who answers the question: What will it take to assure quality? The answer must include how to orient the entire development process to assure quality, how to prevent problems from ever happening, and how to track the root causes of problems and to fix them. The presence of a QA person is a sign of organizational maturity and is almost always indicative of commitment to quality, of understanding that quality does not happen on its own, and of acknowledging that the organization must actively pursue quality. The QA person is sometimes responsible for designing the process and authoring procedures for key phases. Since quality leads to productivity, proper QA always accelerates the schedule, and it sets organizations that practice QA apart from the rest of the industry.
聘用测试工程师。测试工程师不是测试人员,而是成熟的软件工程师,他们设计和编写代码的目的是破坏系统代码。测试工程师一般比普通软件工程师水平更高,因为编写测试工程代码通常涉及更困难的任务:开发虚假通信渠道;设计和开发回归测试;设计测试装置、模拟器、自动化等等。测试工程师非常熟悉系统的架构和内部工作原理,他们利用这些知识试图在每个环节破坏系统。拥有这样一个随时准备破坏产品的“反系统”系统对质量大有裨益,因为您可以在问题发生时立即发现问题,隔离根本原因,避免变化的连锁反应,消除缺陷叠加掩盖其他缺陷,并大大缩短解决问题的周期。拥有一个稳定、无缺陷的代码库可以加快进度,这是其他任何事情都无法做到的。
Employ test engineers. Test engineers are not testers, but rather full-fledged software engineers who design and write code whose objective is to break the system’s code. Test engineers, in general, are a higher caliber of engineer than regular software engineers, because writing test engineering code often involves more difficult tasks: developing fake communication channels; designing and developing regression testing; and designing test rigs, simulators, automation, and more. Test engineers are intimately familiar with the architecture and inner workings of the system, and they take advantage of that knowledge to try to break the system at every turn. Having such an “anti-system” system ready to tear your product apart does wonders for quality because you can discover problems as soon as they occur, isolate root causes, avoid ripple effects of changes, eliminate superimposition of defects masking other defects, and considerably shorten the cycle time of fixing problems. Having a constant, defect-free codebase accelerates the schedule like nothing else ever does.
增加软件测试人员。在大多数团队中,开发人员的数量都超过测试人员。在测试人员太少的项目中,一两个测试人员无法承担团队规模扩大的负担,他们经常被迫执行几乎没有附加值的测试。这种测试是重复性的,不会随着团队规模或系统复杂性的增加而变化,并且经常将系统视为黑盒。这并不意味着没有进行良好的测试,而是大部分测试都转移到了开发人员身上。改变测试人员与开发人员的比例,例如 1:1 甚至 2:1(有利于测试人员),可以让开发人员花更少的时间进行测试,而花更多的时间为项目增加直接价值。
Add software testers. In most teams, the developers outnumber the testers. In projects with too few testers, the one or two testers cannot afford to scale up with the team, and they are frequently reduced to performing testing that has little added value. Such testing is repetitive, does not vary with the team size or the system’s growing complexity, and often treats the system as a black box. This does not mean that good testing does not take place, but rather that the bulk of the testing is shifted onto developers. Changing the ratio of testers to developers, such as 1:1 or even 2:1 (in favor of testers), allows the developers to spend less time testing and more time adding direct value to the project.
投资基础设施。所有软件系统都需要以安全、消息队列和消息总线、托管、事件发布、日志记录、检测、诊断和分析以及回归测试和测试自动化形式出现的通用实用程序。现代软件系统需要配置管理、部署脚本、构建流程、每日构建和冒烟测试(通常归入 DevOps)。您不应该让每个开发人员编写自己独特的基础设施,而应该投资为整个团队构建(和维护)一个框架,以完成此处列出的大部分或所有项目。这可以让开发人员专注于与业务相关的编码任务,提供巨大的规模经济,使新开发人员更容易入职,减少压力和摩擦,并减少开发系统所需的时间。
Invest in infrastructure. All software systems require common utilities in the form of security, message queues and message bus, hosting, event publishing, logging, instrumentation, diagnostics, and profiling, as well as regression testing and test automation. Modern software systems need configuration management, deployment scripts, a build process, daily builds, and smoke tests (often lumped under DevOps). Instead of having every developer write his or her own unique infrastructure, you should invest in building (and maintaining) a framework for the entire team that accomplishes most or all of the items listed here. This focuses the developers on business-related coding tasks, provides great economy of scale, makes it easier to onboard new developers, reduces stress and friction, and decreases the time it takes to develop the system.
提高开发技能。当今的软件环境的特点是变化率非常高。这种变化率超出了许多开发人员跟上最新语言、工具、框架、云平台的能力,和其他创新。即使是最好的开发人员也需要不断地适应技术,他们花费大量的时间在摸索中,以一种非结构化、随意的方式解决问题。更糟糕的是,一些开发人员不堪重负,他们从网上复制粘贴代码,却没有真正理解他们的行为的短期或长期影响(包括法律影响)。为了改善这个问题,你应该投入时间和资源来培训开发人员掌握技术、方法和工具。拥有称职的开发人员将加快开发任何软件所需的时间。
Improve development skills. Today’s software environments are characterized by a very high rate of change. This rate of change exceeds many developers’ ability to keep up with the latest language, tools, frameworks, cloud platforms, and other innovations. Even the best developers are perpetually coming to terms with technology, and they spend an inordinate amount of time stumbling, figuring things out in a nonstructured, haphazard way. Even worse, some developers are so overwhelmed that they resort to copy-and-paste of code from the web, without any real understanding of the short- or long-term implications (including legal ones) of their actions. To ameliorate this problem, you should dedicate the time and resources to train developers on the technology, methodology, and tools at hand. Having competent developers will accelerate the time it takes to develop any software.
改进流程。大多数开发环境都存在流程缺陷。他们为了完成流程而敷衍了事,但缺乏对活动背后原因的真正理解或欣赏。这些空洞的活动没有真正的好处,而且它们往往会让事情变得更糟,就像货物崇拜文化1那样。关于软件开发流程的文章已经很多了。学习经过实践检验的最佳实践,并制定一个改进计划来解决质量、进度和预算问题。根据效果和引入的难易程度对改进计划中的最佳实践进行排序,并主动解决最初没有采用这些最佳实践的原因。编写标准操作程序,让团队和您自己遵循标准操作程序,甚至在必要时强制执行。随着时间的推移,这将使项目更具可重复性,并能够按设定的时间表交付。
Improve the process. Most development environments suffer from a deficient process. They go through the motions for the sake of doing the process, but lack any real understanding or appreciation of the reasoning behind the activities. There are no real benefits from these hollow activities, and they often make things worse, in a cargo-cult culture1 manner. Volumes have been written about software development processes. Educate yourself on the battle-proven best practices and devise an improvement plan that will address the quality, schedule, and budget issues. Sort the best practices in the improvement plan by effect and ease of introduction, and proactively address the reasons why there were absent in the first place. Write standard operating procedures, and have the team and yourself follow the standard operating procedures, and even enforce them if necessary. Over time, this will make projects more repeatable and able to deliver on the set schedule.
采用和使用标准。全面的编码标准涵盖命名约定和样式、编码实践、项目设置和结构、特定于框架的指南、您自己的指南、您团队的注意事项以及已知的陷阱。该标准有助于实施开发最佳实践并避免错误,将新手提升到老手的水平。它使代码统一,并简化了一个开发人员处理另一个开发人员的代码时产生的任何问题。通过遵守标准,开发人员可以增加成功的机会并减少开发系统所需的时间。
Adopt and employ standards. A comprehensive coding standard addresses naming conventions and style, coding practices, project settings and structure, framework-specific guidelines, your own guidelines, your team’s dos and don’ts, and known pitfalls. The standard helps to enforce development best practices and to avoid mistakes, elevating novices to the level of veterans. It makes the code uniform and eases any issues created when one developer works on another’s code. By complying with the standard, developers increase the chances of success and decrease the time it would otherwise take to develop the system.
提供外部专家的访问权限。大多数团队不会有世界级的专家作为成员。团队的工作是了解业务并交付系统,而不是在安全、托管、用户体验、云、人工智能、商业智能、大数据或数据库架构方面非常出色。重新发明轮子非常耗时,而且永远不如访问现成的、经过验证的知识好(回想一下第 2 章中的 2% 问题)。听从外部专家的建议要好得多,也快得多。根据需要在特定的地方使用这些专家,避免代价高昂的错误。
Provide access to external experts. Most teams will not have world-class experts as members. The team’s job is to understand the business and deliver the system, not to be very good in security, hosting, UX, cloud, AI, BI, Big Data, or database architecture. Reinventing the wheel is very time-consuming and is never as good as accessing readily available, proven knowledge (recall the 2% problem from Chapter 2). It is far better and faster to defer to external experts. Use these experts at specific places as required and avoid costly mistakes.
进行同行评审。最好的调试工具是人眼。开发人员发现彼此代码中的问题的速度通常比代码成为系统的一部分后诊断和消除问题的速度要快得多。当涉及到系统中每个服务的需求或设计和测试计划中的缺陷时,情况也是如此。团队应该审查所有这些,以确保最高质量的代码库。
Engage in peer reviews. The best debugger is the human eye. Developers often detect problems in each other’s code much faster than it takes to diagnose and eliminate the problems once the code is part of the system. This is also true when it comes to defects in the requirements or in the design and test plan of each of the services in the system. The team should review all of these to ensure the highest-quality codebase.
1. https://en.wikipedia.org/wiki/Cargo_cult
1. https://en.wikipedia.org/wiki/Cargo_cult
这些软件工程最佳实践将加速整个项目,无论具体活动或项目网络本身如何。它们适用于任何项目、任何环境和任何技术。虽然以这种方式改进项目可能看起来成本高昂,但最终成本可能会降低。开发系统所需时间的减少可以抵消改进的成本。
These software engineering best practices will accelerate the project as a whole, irrespective of specific activities or the project network itself. They are effective in any project, in any environment, and with any technology. While it might appear costly to improve the project this way, it can very well end up costing less. The reduction in the time it takes to develop the system pays for the cost of the improvements.
前面列出的进度加速技术存在的问题是,没有一种可以快速解决进度问题;所有技术都需要时间才能见效。但是,您可以通过两种方式立即加快进度:要么使用更好的资源,要么找到并行工作的方法。通过采用这些技术,您可以压缩项目进度。这种进度压缩并不意味着更快地完成相同的工作。进度压缩意味着更快地实现相同的目标,通常是通过做更多的工作来更快地完成任务或项目。您可以将这两种压缩技术结合使用或单独使用,用于项目的某些部分、整个项目或单个活动。这两种压缩技术最终都会增加项目的直接成本(稍后定义),同时缩短进度。
The issue with the items in the previous list of schedule acceleration techniques is that none is a quick fix for the schedule; all of them take time to be effective. However, you can do two things to immediately accelerate the schedule—either work with better resources or find ways of working in parallel. By employing these techniques you will compress the schedule of the project. Such schedule compression does not mean doing the same work faster. Schedule compression means accomplishing the same objectives faster, often by doing more work to finish the task or the project sooner. You can use these two compression techniques in combination with each other or in isolation, on parts of the project, on the project as a whole, or on individual activities. Both compression techniques end up increasing the direct cost (defined later) of the project while reducing the schedule.
高级开发人员将比初级开发人员更快地交付他们负责的部分系统。然而,人们普遍误以为这种差异是因为他们编码速度更快。初级开发人员的编码速度通常比高级开发人员快得多。高级开发人员尽可能少地花时间编码,而是将大部分时间花在设计代码模块、交互和他们打算用于测试的方法上。高级开发人员为他们正在处理的组件和他们使用的服务编写测试平台、模拟器和仿真器。他们记录他们的工作,考虑每个编码决策的影响,并考虑他们服务的可维护性和可扩展性,以及其他方面,如安全性。因此,虽然这些高级开发人员的单位时间编码速度比初级开发人员慢,但他们完成任务的速度更快。正如您可能猜到的那样,高级开发人员的需求量很大,而且薪酬也比初级开发人员高。您应该将这些更好的资源仅分配给关键活动,因为在关键路径之外利用它们不会改变进度。
Senior developers will deliver their part of the system faster than junior developers. However, it is a common misconception that this difference is because they code faster. Often, junior developers code much faster than senior developers. Senior developers spend as little of their time as possible coding, instead spending the bulk of their time designing the code module, the interactions, and the approaches they intend to use for testing. Senior developers write testing rigs, simulators, and emulators for the components they are working on and for the services they consume. They document their work, they contemplate the implications of each coding decision, and they look at maintainability and extensibility of their services, as well as other aspects such as security. Therefore, while per unit of time such senior developers code more slowly than junior developers do, they complete the task more quickly. As you might suspect, senior developers are at high demand and command higher compensation than do junior developers. You should assign these better resources only to critical activities since leveraging them outside the critical path will not alter the schedule.
一般来说,只要你采取一系列连续的活动,并找到并行执行这些活动的方法,就可以加快进度。并行工作有两种可能的方式。第一种是提取活动的内部阶段并将其移动到项目的其他位置。第二种方法是删除活动之间的依赖关系,以便你可以并行处理这些活动(同时将多个人分配到同一项活动是行不通的,如第 7 章所述)。
In general, whenever you take a sequential set of activities and find ways of performing these activities in parallel, you accelerate the schedule. There are two possible ways of working in parallel. The first is by extracting internal phases of an activity and moving them elsewhere in the project. The second way is by removing dependencies between activities so that you could work in parallel on these activities (assigning multiple people to the same activity at the same time does not work, as explained in Chapter 7).
您不必按顺序执行活动的内部阶段,而是可以拆分活动。您可以将一些依赖性较低的阶段与项目中的其他活动并行安排,无论是在活动之前还是之后。适合提取到项目上游(即在其余活动之前)的内部阶段包括详细设计、文档、模拟器、服务测试计划、服务测试工具、API 设计、UI 设计等。适合移动到项目下游的内部阶段包括与其他服务的集成、单元测试和重复文档。拆分活动可以减少它在关键路径上占用的时间并缩短项目。
Instead of performing the internal phases of an activity sequentially, you can split up the activity. You schedule some of the less-dependent phases in parallel to other activities in the project, either before or after the activity. Good candidates for internal phases to extract upstream in the project (i.e., prior to the rest of the activity) include the detailed design, documentation, emulators, service test plan, service test harness, API design, UI design, and so on. Candidates for internal phases to move downstream in the project include integration with other services, unit testing, and repeated documentation. Splitting an activity reduces the time it occupies on the critical path and shortens the project.
您不必按顺序处理相关活动,而是可以寻找方法来减少甚至消除活动之间的依赖关系,并并行处理活动。如果项目有A依赖于活动的活动B,那么反过来, 又依赖于活动C,项目的持续时间将是这三个活动持续时间的总和。但是,如果你可以消除A和之间的依赖关系,那么你就可以同时B进行和,并相应地压缩时间表。ABC
Instead of working sequentially on dependent activities, you can look for ways to reduce or even eliminate dependencies between activities and work on the activities in parallel. If the project has activity A that depends on activity B, which in turn depends on activity C, the duration of the project would be the sum of the durations of these three activities. However, if you could remove the dependency between A and B, then you could work on A in parallel to B and C and compress the schedule accordingly.
消除依赖关系通常需要首先投资于能够实现并行工作的额外活动:
Removing dependencies often involves investing in additional activities that enable the parallel work in the first place:
契约设计。通过为服务契约进行单独的设计活动,您可以向其使用者提供接口或契约,然后在他们所依赖的服务完成之前开始处理这些契约。提供契约可能不会完全消除依赖关系,但可以实现一定程度的并行工作。子系统甚至系统之间的 UI、消息、API 或协议的设计也是如此。
Contract design. By having a separate design activity for a service contract, you can provide the interface or the contract to its consumers and then start working on those before the service they depend upon is completed. Providing the contract may not remove the dependency completely, but it could enable some level of parallel work. The same goes for the design of UI, messages, APIs, or protocols between subsystems or even systems.
模拟器开发。根据合约设计,您可以编写一个简单的服务来模拟真实服务。这种实现应该非常简单(始终返回相同的结果且没有错误),并且可以进一步消除依赖关系。
Emulators development. Given the contract design, you could write a simple service that emulates the real service. Such implementation should be very simple (always returning the same results and without errors) and could further remove the dependencies.
模拟器开发。除了开发一个模拟器之外,您还可以为一个或多个服务开发一个完整的模拟器。模拟器可以维护状态、注入错误,并且与实际服务难以区分。有时编写一个好的模拟器可能比构建实际服务更困难。但是,模拟器确实消除了服务与其客户端之间的依赖关系,从而允许高度并行工作。
Simulators development. Instead of a mere emulator, you could develop a complete simulator to a service or services. The simulator could maintain state, inject errors, and be indistinguishable from the real service. Sometimes writing a good simulator can be more difficult than constructing the real service. However, the simulator does remove the dependency between the service and its clients, allowing a high degree of parallel work.
重复集成和测试。即使某项服务的模拟器非常出色,仅针对该模拟器开发的客户端也值得担忧。一旦实际服务完成,您必须重复该服务与针对该模拟器开发的所有客户端之间的集成和测试。
Repeated integration and testing. Even with a great simulator for a service, a client developed against only that simulator should be a cause for concern. Once the real service is completed, you must repeat the integration and testing between that service and all clients that were developed against the simulator.
有时,并行工作的最佳候选者在人员分配图中显而易见。如果图表包含多个脉冲,也许您可以将这些脉冲分离。
Sometimes the best candidates for parallel work are evident in the staffing distribution chart. If the chart contains several pulses, perhaps you can decouple these pulses.
考虑图 9-1中的图表,其中展示了三个脉冲。在最初的计划中,所有三个脉冲都是按顺序完成的,因为每个脉冲的输出与下一个脉冲的输入之间存在依赖关系。如果你能找到某种方法来消除这些依赖关系,那么你可以同时处理一个或两个脉冲,从而大大压缩时间表。
Consider the chart in Figure 9-1, which exhibits three pulses. In the original plan, all three were done sequentially due to the dependencies between the outputs of each pulse as the inputs to the next one. If you can find some way of removing those dependencies, you can work on one or two of the pulses in parallel to the other one, significantly compressing the schedule.
图 9-1并行工作的候选对象
Figure 9-1 Candidates for parallel work
两种并行工作形式(拆分活动或消除活动之间的依赖关系)通常都需要额外的资源。项目将需要更多资源来与其他活动并行执行提取的阶段。项目还需要更多资源来开展支持并行工作的额外活动,例如重复集成所需的额外开发人员和重复测试所需的额外测试人员。这将增加项目成本和工作量。具体而言,额外的资源将导致团队规模扩大、峰值团队规模增加、噪音增加和执行效率降低。效率降低将进一步推高成本,因为您从每个团队成员那里获得的收益更少。
Both forms of parallel work—splitting activities or removing dependencies between activities—often require additional resources. The project will need more resources to perform the extracted phases in parallel to other activities. The project also will require more resources to work on the additional activities that enable the parallel work, such as additional developers for the repeated integration, and additional testers for the repeated testing. This will increase the cost of the project and the workload. In particular, the additional resources will result in a larger team, higher peak team size, increased noise, and less efficient execution. The reduced efficiency will drive the cost up even further because you get less from each team member.
现有团队可能由于各种原因(缺少架构师、缺少高级开发人员或团队规模不足)无法进行并行工作,从而迫使您求助于昂贵的高级外部人才。即使您能够负担得起项目的总成本,并行工作也会增加现金流率,并可能使项目变得难以承受。简而言之,并行工作不是免费的。
The team at hand may be incapable of parallel work for a variety of reasons (lack of an architect, lack of senior developers, or an inadequate team size), forcing you to resort to expensive high-grade, external talent. Even if you can afford the total cost of the project, the parallel work will increase the cash flow rate and may make the project unaffordable. In short, parallel work is not free.
一般来说,消除活动之间的依赖关系就像拆除炸弹一样——你应该非常小心地去做。并行工作通常会增加项目的执行复杂性,这样做会大大增加对负责项目的项目经理的要求。在进行并行工作之前,你应该投资基础设施,以加速项目中的所有活动,而不会改变跨活动依赖关系。这可能比并行工作更安全、更容易。
In general, removing dependencies between activities is like defusing a bomb—something you should do very carefully. Parallel work often increases the execution complexity of the project, and by doing so drastically increases the demands on the project manager in charge of the project. Before engaging in parallel work, you should invest in infrastructure that accelerates all activities in the project without changing cross-activities dependencies. This is likely safer and easier than parallel work.
尽管如此,并行工作将缩短整体上市时间。在决定采用压缩并行选项时,请仔细权衡并行执行所产生的风险和成本以及预期的进度缩短。
That said, parallel work will reduce the overall time to market. When deciding on pursuing a compressed parallel option, carefully weigh the incurred risks and cost of parallel execution with the expected reduction in the schedule.
至少在最初阶段,增加成本可以加快任何项目的交付速度。在大多数项目中,时间与成本的权衡并不是线性的,而是理想情况下如图 9-2中的曲线所示。
At least initially, adding cost allows delivering any project faster. In most projects, the trade of time-for-cost is not a linear trade, but rather looks ideally like the curve in Figure 9-2.
图 9-2理想化的时间成本曲线
Figure 9-2 Idealized time–cost curve
例如,考虑一个仅包含编码活动的 10 人年项目。将该项目分配给单个开发人员将需要 10 年时间完成,成本为 10 人年。但是,将同一个项目分配给两个开发人员可能需要 7 年或更长时间,而不是 5 年。要在 5 年内完成该项目,您将至少需要 3 名开发人员,更可能需要 5 名甚至 6 名开发人员。这些成本(10 年需要 10 人年的成本,7 年需要 14 人年的成本,5 年需要 30 人年的成本)确实是成本与时间的非线性交易的表现。
For example, consider a 10-man-year project consisting only of coding activities. Assigning this project to a single developer will take 10 years to complete and cost 10 man-years. However, assigning the same project to two developers will likely take 7 years or more, not 5. To complete this project in 5 years, you will need at least 3 developers, and more likely 5 or even 6 developers. These costs (10 years for 10 man-years of cost, 7 years for 14 man-years, and 5 years for 30 man-years) are indeed the expression of nonlinear trades of cost for time.
图 9-2所示的时间成本曲线既理想又不现实。它假设,只要有足够多的预算,项目几乎可以立即完成。常识告诉你,这个假设是错误的。例如,无论多少资金可以在一个月内(或者一年内)完成一个需要 10 人年的项目。所有压缩工作都有其自然极限。同样,图 9-2的时间成本曲线表明,如果有更多的时间,项目成本就会下降,而(如第 7 章所述)给予项目比所需更多的时间实际上会增加其成本。
The time–cost curve depicted in Figure 9-2 is both ideal and unrealistic. It assumes that given a large enough budget, the project could be done nearly instantaneously. Common sense tells you that assumption is wrong. For example, no amount of money can deliver a 10-man-year project in a month (or, for that matter, in a year). There is a natural limit to all compression efforts. Similarly, the time–cost curve of Figure 9-2 indicates that given more time, the cost of the project goes down, while (as discussed in Chapter 7) giving projects more time than is required actually drives up their cost.
虽然图 9-2中的时间成本曲线是不正确的,但可以讨论时间成本曲线上所有项目中都存在的点。这些点是一些经典规划假设的结果。图 9-3显示了实际的时间成本曲线。
While the time–cost curve of Figure 9-2 is incorrect, it is possible to discuss points on the time–cost curve that are present across all projects. These points are the result of a few classic planning assumptions. Figure 9-3 shows the actual time–cost curve.
图 9-3实际时间-成本曲线 [摘自 James M. Antill 和 Ronald W. Woodhead 所著《建筑实践中的关键路径》第 4 版(Wiley,1990 年),并经过修改。]
Figure 9-3 Actual time–cost curve [Adopted and modified from James M. Antill and Ronald W. Woodhead, Critical Path in Construction Practice, 4th ed. (Wiley, 1990).]
在设计项目时,您可以假设自己拥有无限的资源,并且每种资源在需要时都可以使用。同时,在设计项目时,您应该着眼于最低成本,避免要求比实际需要更多的资源。如第 7 章所述,您可以找到最低级别的资源,使您能够沿着关键路径畅通无阻地前进。这将为您提供构建系统的最便宜的方法和最高效的团队。这样的项目设计选项称为正常解决方案。正常解决方案代表构建系统的最不受约束或最自然的方式。
You can always design a project by assuming you have unlimited resources and that every resource will be available when required. At the same time, you should design the project with an eye for minimum cost, and avoid asking for more resources than are really required. As explained in Chapter 7, you can find the lowest level of resources that will allow you to progress unimpeded along the critical path. This will give you both the least expensive way of building the system and the most efficient team. Such a project design option is called the normal solution. The normal solution represents the most unconstrained or natural way of building the system.
假设该项目的正常解决方案的持续时间为一年。用一年以上的时间来完成同一个项目总是花费更多。额外的成本来自于更长的时间使用资源、累积的间接费用、镀金、增加的复杂性以及成功概率的降低。因此,时间成本曲线上正常右侧的点属于项目的不经济区域。
Suppose the duration of the normal solution of the project is a year. Giving the same project more than a year to complete always costs more. The additional cost comes from employing resources for longer periods of time, from the accumulated overhead, from gold plating, from increased complexity, and from the reduction of the probability of success. Therefore, points to the right of normal on the time–cost curve belong to the uneconomical zone of the project.
您可以使用本章前面介绍的部分或全部压缩技术来压缩正常解决方案。虽然所有压缩解决方案的持续时间都较短,但它们的成本也更高,很可能是以非线性的方式。显然,您应该只将压缩工作重点放在关键路径上的活动上,因为压缩非关键活动对进度没有任何作用。每个压缩解决方案都位于时间成本曲线上正常解决方案的左侧。
You can compress the normal solution by using some or all of the compression techniques described earlier in this chapter. While all of the resulting compressed solutions are of shorter duration, they also cost more, likely in nonlinear ways. Obviously, you should focus your compression effort only on activities on the critical path, because compressing noncritical activities does nothing for the schedule. Each compressed solution is to the left of the normal solution on the time–cost curve.
随着项目的压缩,成本不断上升。在某个时候,关键路径将被完全压缩,因为没有更多候选人可以并行工作,而且你已经在关键活动上雇佣了最优秀的人才。当你达到这一点时,你就拥有了项目最短时间或最短持续时间的解决方案。每个项目总是有这样一个最短持续时间点,在这个点上,无论多少金钱、努力或意志力都无法更快地完成它。
As you compress the project, the cost keeps escalating. At some point the critical path will be fully compressed because there are no more candidates for parallel work and you have already employed the best people on the critical activities. When you reach this point, you have the least time or minimum duration solution of the project. Every project always has such a minimum duration point, where no amount of money, effort, or willpower can deliver it any faster.
虽然您无法以比最短可能工期更快的速度完成项目,但您总是会浪费金钱。没有什么可以阻止您压缩项目中的所有活动,无论是关键活动还是非关键活动。项目不会以比最短工期更快的速度完成,但肯定会花费更多。时间成本曲线上的这个点称为完全压缩点。
While you cannot build the project any faster than its shortest possible duration, you can always waste money. Nothing prevents you from compressing all activities in the project, critical and noncritical alike. The project will not complete any faster than minimum duration, but it will certainly cost more. This point on the time–cost curve is called the full compression point.
图 9-3所示的实际时间成本曲线在正常解决方案和最短工期解决方案之间提供了无限个点。然而,没有人有时间设计无限数量的项目解决方案,也没有必要这样做。相反,架构师和项目经理必须为管理层提供正常解决方案和最短工期解决方案之间的一两个实际点。这些选项代表了管理层可以选择的合理的时间与成本权衡,并且始终是某种网络压缩的结果。因此,在项目设计期间实际生成的曲线是一个离散模型,如图9-4所示。虽然图 9-4的时间成本曲线比图 9-3少得多,但它有足够的信息来正确辨别项目的行为方式。
The actual time–cost curve shown in Figure 9-3 offers infinite points between the normal solution and the minimum duration solution. Yet no one has the time to design an infinite number of project solutions, nor is there any need to do so. Instead, the architect and the project manager must provide management with one or two practical points in between the normal solution and the minimum duration solution. These options represent some reasonable trade of time for cost from which management can choose and are always be the result of some network compression. As a result, the curve you actually produce during project design is a discrete model, as shown in Figure 9-4. While the time–cost curve of Figure 9-4 has far fewer points than Figure 9-3, it has enough information to discern correctly how the project behaves.
图 9-4离散时间成本曲线[摘自 James M. Antill 和 Ronald W. Woodhead 所著《建筑实践中的关键路径》,第 4 版(Wiley,1990 年),并经过修改。]
Figure 9-4 Discrete time–cost curve [Adopted and modified from James M. Antill and Ronald W. Woodhead, Critical Path in Construction Practice, 4th ed. (Wiley, 1990).]
你应该向管理层介绍不切实际的完全压缩和不经济的解决方案,因为许多管理人员根本不知道这些解决方案不切实际。管理人员对项目行为的思维模型可能存在错误——最有可能是图 9-2中所示的思维模型。当你有错误的思维模型时,你总是会做出错误的决定。
You should present the impractical full compression and the uneconomical solutions to management, because many managers are simply unaware of their impracticality. Managers may have the wrong mental model of how a project behaves—most likely the mental model depicted in Figure 9-2. When you have the wrong mental model, you always make the wrong decisions.
假设进度是最优先的事项,而经理不介意花费一切代价来履行承诺。经理可能认为,可以投入资金和人力来推动团队在最后期限前完成任务,尽管再多的钱也无法在最短时间内完成任务。
Suppose schedule is the utmost priority, and the manager does not mind spending whatever it takes to meet the commitment. The manager may think that it is possible to throw money and people on the project to push the team toward a deadline, even though no amount of money can deliver below minimum duration.
预算有限但计划可修改性更强的经理也很常见。这样的经理可能会试图通过不充分地为项目配备人员或不向项目提供所需资源来削减成本。这样做会将项目从正常解决方案推入不经济的区域,再次导致成本大幅增加。
It is also just as common to find managers with a constrained budget but more amendable schedule. Such a manager may attempt to cut the cost by subcritically staffing the project or by not giving the project the required resources. Doing so pushes the project right of the normal solution into the uneconomical zone, again causing it to cost much more.
时间成本曲线显示了项目的一个重要方面:可行性。表示曲线上或上方点的时间和成本项目设计解决方案是可行的。例如,考虑图 9-5A中的点。该解决方案需要时间和成本。虽然是一个可行的解决方案,但它不是最优的。如果是可接受的截止日期,那么该项目也可以以 C2 成本(即 时的时间成本曲线的值)交付。因为高于曲线,所以< 。相反,如果 的成本是可以接受的,那么对于相同的成本,也可以在 的时间内交付项目,即 时的时间成本曲线的值。由于位于曲线右侧,所以< 。AT2C1AT2T2AC2C1C1T1C1AT1T2
The time–cost curve shows a paramount aspect of the project: feasibility. Project design solutions of time and cost representing points at or above the curve are doable. For example, consider the point A in Figure 9-5. The A solution calls for T2 time and C1 cost. While A is a feasible solution, it is suboptimal. If T2 is an acceptable deadline, then the project could also be delivered for C2 cost, the value of the time–cost curve at the time of T2. Because A is above the curve, it follows that C2 < C1. Conversely, if the cost of C1 is acceptable, then for the same cost it is also possible to deliver the project in a time of T1, the value of the time–cost curve at the time of C1. Since A is to the right of the curve, it follows that T1 < T2.
图片 9-5时间成本曲线上方的次优解决方案
Figure 9-5 Suboptimal solution above the time–cost curve
时间成本曲线上的点只是代表了时间与成本的最优权衡。时间成本曲线是最优的,因为更快地(以相同的成本)或以更低的成本(在相同的期限内)交付项目总是更好的。你可能会做得更糟,但不会比时间成本曲线更好。这也意味着时间成本曲线下方的点是不可能的。例如,考虑图 9-6B中的点。解决方案需要时间和成本。但是,要在 的时间交付项目,至少需要 的成本。由于 B 低于时间成本曲线,因此> 。如果是你所能负担的全部,那么该项目至少需要 的时间。由于位于时间成本曲线的左侧,因此> 。BT3C4T3C3C3C4C4T4BT4T3
The points on the time–cost curve simply represent the most optimal trade of time for cost. The time–cost curve is optimal because it is always better to deliver the project faster (for the same cost) or at lower cost (for the same deadline). You could do worse, but not better than the time–cost curve. This also implies that points under the time–cost curve are impossible. For example, consider the point B in Figure 9-6. The B solution calls for T3 time and C4 cost. However, to deliver the project at time of T3 would require at least the cost of C3. Since B is below the time–cost curve, it follows that C3 > C4. If C4 is all you can afford, then the project would require at least the time of T4. Since B is left of the time–cost curve, it follows that T4 > T3.
图 9-6时间成本曲线下方的不可能解
Figure 9-6 Impossible solution below the time–cost curve
如果时间成本曲线上的点只是任意持续时间的最小成本,那么时间成本曲线会将该区域划分为两个区域。第一个区域是可行解决方案区域,涵盖时间成本曲线上方或与时间成本曲线相同的解决方案。第二个区域是死亡区域,涵盖时间成本曲线下方的所有解决方案,如图9-7所示。
If the points on the time–cost curve are simply minimum cost for any duration, then the time–cost curve divides the area into two zones. The first is the feasible solutions zone, covering solutions above or at the time–cost curve. The second zone is the death zone, encompassing all solutions below the time–cost curve, as shown in Figure 9-7.
图 9-7死亡地带
Figure 9-7 The death zone
我再怎么强调也不为过,不要在死亡地带设计项目。在死亡地带的项目在有人写下第一行代码之前就已经失败了。成功的关键既不是架构也不是技术,而是避免选择死亡地带的项目。
I cannot stress enough how important it is not to design a project in the death zone. Projects in the death zone have failed before anyone writes the first line of code. The key to success is neither architecture nor technology, but rather avoiding picking a project in the death zone.
在寻找正常解决方案时,通常事先并不知道顺利完成关键路径所需的最低人员配备水平。例如,您可以为项目配备 12 名开发人员,但这并不意味着您不能在不延误项目的情况下为项目配备 8 名甚至 6 名开发人员。因此,寻找正常的人员配备水平是一个反复的过程,如图9-8所示。
When searching for the normal solution, the minimum level of staffing required to proceed unimpeded along the critical path is often unknown up front. For example, the fact that you could staff the project with 12 developers does not mean you could not have done so with 8 or even 6 developers without delaying the project. Consequently, finding the normal level of staffing is an iterative process, as shown in Figure 9-8.
图 9-8寻找正常解决方案
Figure 9-8 Finding the normal solution
对于每次尝试正常解决方案,您都会逐步用更多的浮动时间换取资源。由于浮动时间减少,这种交易自然会增加项目的风险。这也意味着真正的正常解决方案已经具有相当大的风险。但是,真正的正常解决方案所需的最低人员配备水平通常在风险方面已经足够好,因为有足够的浮动时间可以满足项目的承诺。
For each iterative attempt at a normal solution, you progressively trade more float for resources. This trade will naturally increase the risk of the project due to the reduced float. It also means that the true normal solution already has a considerable level of risk. However, the lowest staffing level required for the true normal solution is often good enough risk-wise, because sufficient float remains to meet the project’s commitments.
在寻找正常解决方案的人员配备水平时,您应该根据实际情况做出细微的调整。例如,在一个为期一年的项目中,如果可以通过将计划延长一周来避免雇用其他资源,那么您可能应该接受这种交易。找到简化项目执行或降低其集成风险的方法也是一个好主意,以换取稍微延长持续时间或稍微增加成本。您应该始终倾向于根据实际情况做出这些调整。因此,中间的正常尝试可能不会完全垂直对齐(如图 9-8所示),而是可能会稍微向右或向左偏移。
When looking for the staffing level of the normal solution, you should make minor accommodations for reality. For example, in a year-long project, if you could avoid hiring another resource by extending the schedule by a week, then you should probably take that trade. It is also a good idea to find ways of simplifying the project execution or reducing its integration risk in exchange for a slight extension of the duration or a small increase in cost. You should always prefer these accommodations for reality. As a result, the intermediate normal attempts may not be aligned exactly vertically atop each other (as in Figure 9-8), but rather may drift a little to the right or to the left.
什么构成了为适应现实而进行的微小调整,什么构成了在寻找正常解决方案时意图的扭曲,这需要判断。我的经验法则是,任何少于 2-3% 的进度或成本都是噪音水平,可以进行此类修改。2-3% 的指导方针与项目设计和跟踪分辨率有关。如果活动只有一周的粒度,并且如果项目是按周跟踪的,那么在一年内,设计和测量分辨率是 2%,任何更精确的东西都只是噪音。第11 章和第 13 章展示了此类调整。
What constitutes a minor adjustment to accommodate reality and what qualifies as a distortion of the intent in finding the normal solution is a judgment call. My rule of thumb is that anything less than 2–3% of the schedule or the cost is at the noise level and is fair game for such modifications. The 2–3% guideline is related to the project design and tracking resolution. If activities are only as granular as one week, and if the project is tracked on a weekly basis, then over a year, the design and measurement resolution is 2%, making anything more precise just noise. Chapter 11 and Chapter 13 demonstrate such accommodations.
到目前为止,对项目成本的讨论都过于简单,因为项目的总成本由两个成本要素组成:直接成本和间接成本。在设计项目时,您应该计算这两个成本要素以及项目的总成本。了解项目成本要素之间的相互作用对于合理的项目设计和决策至关重要。
The discussion of the project cost so far has been simplistic because the total cost of the project is composed of two elements of cost: direct cost and indirect cost. As you design the project, you should calculate both of these elements of cost as well as the project’s total cost. Understanding the interplay between the cost elements of the project is crucial for sound project design and decision making.
项目的直接成本包括为项目增加直接可衡量价值的活动。这些活动与项目计划挣值图中显示的明确项目活动相同。如第 7 章所述,计划挣值(以及直接成本)在项目生命周期内不断变化,从而形成一条平缓的 S 曲线。
The project’s direct cost comprises activities that add direct measurable value to the project. These are the same explicit project activities shown in the project’s planned earned value chart. As explained in Chapter 7, the planned earned value (and hence the direct cost) varies over the project’s lifetime, resulting in a shallow S curve.
软件项目的直接成本通常包括以下项目:
The direct cost of a software project typically includes the following items:
致力于服务的开发人员
Developers working on services
执行系统测试的测试人员
Testers performing system testing
设计数据库的数据库架构师
Database architect designing a database
测试工程师设计和构建测试工具
Test engineers designing and building a test harness
设计用户界面和用户体验的 UI/UX 专家
UI/UX experts designing the user interface and the user experience
设计系统或项目的架构师
The architect designing the system or the project
该项目直接成本曲线如图9-3所示。
The direct cost curve of the project looks like Figure 9-3.
项目的间接成本包括为项目增加间接不可估量价值的活动。此类活动通常是持续进行的,不会显示在挣值图或项目计划中。
The project’s indirect cost comprises activities that add indirect immeasurable value to the project. Such activities are typically ongoing, and are not shown in the earned value charts or the project plan.
软件项目的间接成本通常包括以下项目:
The indirect cost of a software project typically includes the following items:
SDP评审后的核心团队(即架构师、项目经理、产品经理)
The core team (i.e., architect, project manager, product manager) after the SDP review
持续的配置管理、每日构建和每日测试,或者一般的 DevOps
Ongoing configuration management, daily build and daily test, or DevOps in general
假期和节假日
Vacations and holidays
任务之间投入的资源
Committed resources between assignments
大多数项目的间接成本与项目持续时间大致成正比。项目耗时越长,间接成本就越高。如果你绘制项目随时间变化的间接成本图,应该会得到一条大致的直线。
The indirect cost of most projects is largely proportional to the duration of the project. The longer the project takes, the higher the indirect cost. If you were to plot the indirect cost of the project over time, you should get roughly a straight line.
把间接成本视为不必要的开销是错误的。如果没有专门的架构师和项目经理,项目就会失败,但在 SDP 审查之后,他们在计划中没有明确的活动。
It is wrong to think of indirect cost as needless overhead. The project will fail without a dedicated architect and a project manager, yet after the SDP review, they have no explicit activities in the plan.
直接成本和间接成本的概念太多了。有些人认为直接成本是与直接团队成员相关的成本,而间接成本是外部顾问或分包商的成本。其他人则将直接成本简单地定义为他们必须支付的成本,将间接成本定义为其他人或组织必须支付的成本。然而,谁最终为资源买单是一个会计问题,而不是项目设计问题。本章中的定义完全是从价值的角度出发的:资源或活动是否增加了可衡量的价值或不可衡量的价值?
The concepts of direct and indirect costs are overloaded. Some regard direct cost as costs associated with direct team members and indirect cost as costs of external consultants or subcontractors. Others define direct cost simply as that for which they must pay, and indirect cost as that for which other people or organizations must pay. However, the question of who ends up paying for the resources is an accounting question, not a project design question. The definitions in this chapter are strictly from a value perspective: Does a resource or an activity add a measurable or immeasurable value?
项目总成本是其直接成本与间接成本的总和:
The total cost of the project is the sum of its direct and indirect costs:
总成本=直接成本+间接成本
Total Cost = Direct Cost + Indirect Cost
给出了直接成本和间接成本的定义,图 9-9显示了成本的两个要素以及由此产生的项目总成本。
Given the definitions of direct and indirect costs, Figure 9-9 shows the two elements of cost and the resulting total cost of the project.
图 9-9项目直接成本、间接成本和总成本曲线[摘自 James M. Antill 和 Ronald W. Woodhead 所著《建筑实践中的关键路径》第 4 版(Wiley,1990 年),并作了修改。]
Figure 9-9 Project direct, indirect, and total cost curves [Adopted and modified from James M. Antill and Ronald W. Woodhead, Critical Path in Construction Practice, 4th ed. (Wiley, 1990).]
间接成本显示为一条直线,直接成本曲线与前面几幅图中的曲线相同。图 9-9中的直接和间接曲线都是离散解的乘积,总成本曲线是这些点的直接成本和间接成本之和。
The indirect cost is shown as a straight line, and the direct cost curve is the same curve shown in the previous figures. Both the direct and indirect curves in Figure 9-9 are the product of discrete solutions, and the total cost curve is the sum of the direct and indirect costs at each of these points.
与直接成本曲线一样,总成本曲线上方的解决方案是可行的,而总成本曲线下方的解决方案则是不可能的。因此,总成本曲线下方的区域是项目的实际死亡区,因为它同时考虑了间接成本和直接成本。仅仅位于直接成本曲线上的死亡区上方可能并不意味着项目已经脱离危险,因为您仍然需要支付间接成本。花时间对总成本曲线进行建模,然后简单地观察您给出的参数是否留下了成功的机会。您将在第 11 章中看到如何做到这一点。
As with the direct cost curve, solutions above the total cost curve are feasible, while solutions below it are impossible. Therefore, the area below the total cost curve is the actual death zone of the project because it takes into account both the indirect cost and the direct cost. Merely being above the death zone on the direct cost curve may not mean the project is out of danger because you still have to pay for the indirect cost. Spend the time to model the total cost curve and then simply observe if the parameters you are given leave any chance for success. You will see how to do just that in Chapter 11.
压缩的项目设计方案将缩短项目的工期,因此也将减少项目的间接成本。这反过来又会抵消压缩项目的成本。例如,在图 9-9中,考虑正常解决方案(直接成本曲线上的最低点)与其左侧的压缩解决方案之间的所有三条曲线上的线段。在直接成本曲线上,压缩解决方案在这两点之间有相当大的额外成本。然而,在总成本曲线上,一旦将间接成本考虑在内,总成本的差异就会大大缩小。正常解决方案和第一个压缩点之间的总成本略有增加,就会得到相同的进度缩短。累积的间接成本使压缩更引人注目,至少在最初是这样,因为压缩往往会收回成本。在许多项目中,间接成本的减少甚至可能比进度缩短带来更大的好处。
A compressed project design solution will reduce the duration of the project and, therefore, will also reduce the indirect cost of the project. This, in turn, tends to offset the cost of compressing the project. For example, in Figure 9-9, consider the segments on all three curves between the normal solution (the lowest point on the direct cost curve) and the compressed solution to its left. On the direct cost curve, the compressed solution has a substantial additional cost between these two points. However, on the total cost curve, once you factor in the indirect cost, the difference in total cost is much reduced. For a little increase in total cost between the normal solution and the first compression point, you would get the same reduction of schedule. The accumulated indirect cost makes compressing more compelling, at least initially, because compression will tend to pay for itself. In many projects the reduction in indirect cost may be of an even a greater benefit than the schedule reduction.
对于直接成本曲线,正常点按定义也是最低成本解决方案。其右侧是不经济区域,其左侧是压缩解决方案,这些解决方案会用时间来换取额外的成本。但是,一旦您添加了间接成本以找到每个设计解决方案的项目总成本,最低总成本解决方案将不再是正常解决方案。添加间接成本会将最低总成本点移至正常解决方案左侧的某个位置。此外,间接成本线的斜率越陡,向最低总成本点左侧的移动就越明显。
With the direct cost curve, the normal point is by definition also the minimum cost solution. To its right is the uneconomical zone, and to its left are the compressed solutions that trade time for additional cost. However, once you have added the indirect cost to find the total cost of the project for each of the designed solutions, the minimum total cost solution will no longer be the normal solution. Adding the indirect cost will shift the minimum total cost point somewhere to the left of the normal solution. Moreover, the steeper the slope of the indirect cost line, the more significant the shift to the left of the minimum total cost point.
举例来说,请看图 9-10。在直接成本曲线上,正常解决方案显然是成本最低的点。然而,在总成本曲线上,成本最低的点是正常解决方案左侧的第一个压缩解决方案。在这种情况下,压缩项目实际上降低了项目成本。这使得项目总成本最低的点从时间成本角度成为最佳项目设计方案,因为它比正常情况更快地完成项目,并且总成本更低。
As an example, consider Figure 9-10. On the direct cost curve the normal solution is clearly the point of minimum cost. However, on the total cost curve, the point of minimum cost is the first compressed solution to the left of the normal solution. In this case, compressing the project has actually reduced the cost of the project. This makes the point of minimum total cost of the project the optimal project design option from a time–cost perspective because it completes the project faster than normal and at a lower total cost.
图片 9-10高间接成本使最小总成本向正常解决方案的左侧移动
Figure 9-10 A high indirect cost shifts minimum total cost left of the normal solution
在图 9-10中,由于图表的离散性,总成本最低点向左移动的现象更加明显。也就是说,即使是连续图表(例如图 9-11),也总会向左移动,其间接成本线的斜率低于图 9-10中所示的斜率。
In Figure 9-10, the shift to the left of the point of minimum total cost is accentuated because of the discrete nature of the charts. That said, there is always a shift to the left, even with a continuous chart such as Figure 9-11, which has a lower slope of the indirect cost line than that shown in Figure 9-10.
图片 9-11连续时间成本曲线左移
Figure 9-11 Shift to the left on a continuous time–cost curve
因为你永远只能开发一小部分项目设计解决方案,所以你构建的时间成本曲线将始终是项目的离散模型。考虑到您拥有的解决方案(以及间接成本水平),正常解决方案可能确实是总成本最低的点,如图9-9所示。然而,这种结果具有误导性,只是缺少一些略微偏离正常的未知设计解决方案的产物。
Because you will only ever develop a small set of project design solutions, the time–cost curve you build will always be a discrete model of the project. With the solutions you have (and the level of indirect cost), the normal solution may indeed be the point of minimum total cost, as in Figure 9-9. However, that outcome is misleading and is simply an artifact of missing some unknown design solution slightly left of normal.
图 9-12与图 9-9相同,只是它在正常解的左侧立即添加了未知点,以说明向最小总成本点左侧的移动。
Figure 9-12 is the same as Figure 9-9 except that it adds that unknown point immediately to the left of the normal solution to illustrate the shift to the left of the point of minimum total cost.
图片 9-12最小总成本作为未知点
Figure 9-12 Minimum total cost as an unknown point
这种情况的问题在于您不知道如何做出该解决方案:您不知道资源和压缩的哪种组合会产生该点。虽然理论上总是存在这样的解决方案,但在实践中,对于大多数项目,您可以将正常解决方案的总成本与项目的最低总成本相等。正常解决方案的总成本与真正的最低总成本点之间的差异通常不足以证明找到该确切点所需的努力是合理的。
The problem with this situation is that you have no idea how to make that solution: You do not know what combination of resources and compression yields this point. While such a solution always exists in theory, in practice for most projects you can equate the total cost of the normal solution with the minimum total cost of the project. The difference between the total cost of the normal solution and the point of true minimum total cost often does not justify the effort required to find that exact point.
间接成本线的斜率越陡,向最小总成本点左侧的移动越深。如果间接成本高,压缩解决方案之一可能就是项目的最佳点,因为其成本低于正常解决方案,同时交付时间更短。然而,这也带来了一个问题:压缩程度越高的项目设计方案通常风险越高。这种风险既可能归因于项目的关键性,也可能归因于其增加的执行复杂性。因此,间接成本高意味着项目的最佳设计点可能是一个高风险的选择。将最佳选择作为有风险的选择几乎不是成功的秘诀。这意味着间接成本高的项目几乎总是高风险项目。您将在后面的章节中看到如何应对这些风险。
The steeper the slope of the indirect cost line, the more profound the shift to the left of the point of minimum total cost is. With a high indirect cost, one of the compressed solutions will likely be the optimal point of the project since its cost will be lower than the normal solution while delivering on a shorter schedule. However, this presents a problem: The more compressed project design solutions typically carry a higher risk. This risk can be due both to the criticality of the project and to its increased execution complexity. A high indirect cost therefore means that the optimal design point of the project is likely to be a high-risk option. Having your best option be a risky option is hardly a recipe for success. The implication is that projects with high indirect cost are almost always also high-risk projects. You will see how to address these risks in the following chapters.
对于每个项目设计解决方案,您必须考虑直接成本和间接成本。如第 7 章所述,软件项目的总成本是人员配置分布图下的区域。如果您知道项目总成本和其中一个成本要素(例如直接成本),则可以提取其他成本元素,方法是将其从总成本中减去。对于每个项目设计解决方案,您首先要为项目配备人员,然后绘制计划挣值图和计划人员分配图。接下来,计算人员分配图下的总成本区域,并汇总所有直接成本活动(您在挣值图上显示的活动)的工作量。间接成本只是两者之间的差额。
For each of your project design solutions, you must account for both direct and indirect cost. As established in Chapter 7, the total cost of a software project is the area under the staffing distribution chart. If you know the project total cost and one of the cost elements such as the direct cost, you can extract the other cost element by subtracting it from the total cost. For each project design solution, you first staff the project, then draw the planned earned value chart and the planned staffing distribution chart. Next, you calculate the area under the staffing distribution chart for the total cost, and you also sum up the effort across all direct cost activities (the ones you show on the earned value chart). The indirect cost is simply the difference between the two.
图 9-13以图形方式显示了人员分配图下成本要素的典型细分(另请参阅图 7-8)。在项目的前端,只有核心团队参与,而这些工作的大部分涉及间接成本。核心团队付出的其余努力确实有一些直接价值,例如设计系统和项目。然而,经过 SDP 评审后,核心团队变成了纯粹的间接成本。在 SDP 评审之后,项目还有额外的持续间接成本,例如 DevOps、每日构建和每日测试。其余人员是直接成本,例如开发人员构建系统。
Graphically, Figure 9-13 shows a typical breakdown of the cost elements under the staffing distribution chart (also refer to Figure 7-8). In the front end of the project, only the core team is engaged, and much of that work involves indirect cost. The rest of the effort expended by the core team does have some direct value, such as designing the system and the project. However, past the SDP review, the core team turns into pure indirect cost. After the SDP review, the project has additional ongoing indirect costs such as DevOps, daily build, and daily tests. The rest of the staffing is a direct cost, such as developers building the system.
图片 9-13人员配置分布下的成本要素
Figure 9-13 Cost elements under the staffing distribution
从图 9-13可以看出,典型的软件项目的间接成本要高于直接成本。大多数人没有意识到交付高质量、复杂的软件系统需要多少间接成本。直接成本与间接成本之比为 1:2 很常见,但这个比例很容易更高。直接成本与间接成本的确切比率通常是业务性质的一个方面。例如,与生产常规业务系统的公司相比,生产航空电子设备的公司会拥有更高的间接成本。
As you can see from Figure 9-13, a typical software project will have more indirect cost than direct cost. Most people fail to recognize just how much indirect cost is required to deliver a high-quality, complex software system. A ratio of 1:2 of direct cost to indirect cost is quite common, but this ratio can easily be higher. The exact ratio of direct cost to indirect cost is often an aspect of the nature of the business. For example, you would expect a higher indirect cost in a company that produces avionics compared with one that produces a regular line of business systems.
在软件项目中,间接成本通常是总成本的主要因素。这导致了一个关键的观察结果:在其他条件相同的情况下,较短的项目总是成本较低,因为它们产生的间接成本较少。无论您如何实现较短的进度,无论是通过压缩项目还是采用本章开头的最佳实践,情况都是如此。即使压缩项目需要额外的资源甚至更昂贵的资源,较短的项目成本也会较低。
In a software project the indirect cost is often the dominant element of the total cost. This leads to a key observation: All things being equal, shorter projects always cost less simply because they incur less indirect cost. This is the case regardless of how you achieve the shorter schedule—whether by compressing the project or by employing the best practices which opened this chapter. A shorter project costs less even when compressing the project requires additional resources or even more expensive resources.
不幸的是,许多经理根本不知道项目越短,成本就越低,这导致了一个典型的错误。当预算紧张时,经理会试图通过限制资源(即资源的质量或数量)来降低成本。这会使项目时间更长,最终成本会更高。
Unfortunately, many managers are simply unaware that shorter projects will cost less, which leads to a classic mistake. When faced with a tight budget, the manager will try to reduce the cost by throttling the resources (i.e., either the quality or the quantity of the resources). This will make the project longer, so that it ends up costing much more.
软件项目还有另一个随时间固定的成本要素。固定成本可能包括计算机硬件和软件许可证。项目的固定成本表示为间接成本线的不断上移(图 9-14)。
Software projects have yet another element of cost that is fixed with time. Fixed cost might include computer hardware and software licenses. The fixed cost of the project is expressed as a constant shift up of the indirect cost line (Figure 9-14).
图片 9-14添加固定成本
Figure 9-14 Adding fixed cost
因为固定成本只是将总时间成本曲线向上移动,所以它对决策过程没有任何帮助,因为它对所有选项的影响几乎是相等的(可能会随着团队规模而略有变化)。在大多数规模适中的软件项目中,固定成本约占总成本的 1-2%,因此通常可以忽略不计。
Because the fixed cost merely shifts the total time–cost curve up, it adds nothing to the decision-making process, as it affects all options almost equally (it may change slightly with the team size). In most decent-size software projects, the fixed cost will be approximately 1–2% of the total cost, so it is typically negligible.
压缩项目将改变项目网络。压缩应该是一个迭代过程,在这个过程中,你要不断寻找最佳的下一步。你从项目的正常解决方案开始压缩项目。正常解决方案应该对压缩反应良好,因为它处于时间成本曲线的最小值。如前所述,最初压缩甚至可能最终收回成本。此外,在正常解决方案的左侧,时间成本曲线是最平坦的。这意味着你的前一两个压缩点将提供压缩成本的最佳投资回报 (ROI)。然而,随着你进一步压缩项目,你将开始攀升时间成本曲线,最终体验到压缩成本的收益递减。项目将提供越来越少的进度缩减,同时产生越来越高的成本,就好像项目抵制更多的压缩一样。当压缩整个项目时,你应该尝试通过压缩先前压缩的解决方案来复合效果,而不仅仅是在基线正常解决方案上尝试一种新的压缩技术。
Compressing the project will change the project network. Compression should be an iterative process in which you constantly look for the best next step. You start compressing the project from its normal solution. The normal solution should respond well to compression because it is at the minimum of the time–cost curve. As noted, initially the compression may even end up paying for itself. In addition, immediately to the left of the normal solution, the time–cost curve is the most flat. This means that your first one or two compression points will provide the best return on investment (ROI) of the compression cost. However, as you compress the project further, you will start climbing up the time–cost curve, eventually experiencing diminishing returns on the cost of compression. The project will offer less and less reduction in schedule while incurring ever higher cost, as if the project resists more compression. When compressing the project as a whole, you should attempt to compound the effect by compressing a previously compressed solution, not just trying a new compression technique on the baseline normal solution.
您应该避免压缩那些无论花费多少成本都无法很好地压缩的活动(例如架构)或已经完全压缩的活动。由于即使是单个活动也有自己的时间成本曲线,最初一项活动可能很容易压缩,但后续压缩将需要额外的成本来爬升活动自己的时间成本曲线。在某个时候,活动将无法进一步压缩。因此,一般来说,压缩其他活动比反复压缩同一活动更好。
You should avoid compressing activities that will not respond well to compression regardless of the cost spent on them (such as architecture) or activities that are already fully compressed. Since even individual activities have their own time–cost curve, initially an activity may be easy to compress, but subsequent compression will require additional cost to climb the activity’s own time–cost curve. At some point the activity will be impossible to compress any further. For this reason, it is better, in general, to compress other activities than to repeatedly compress the same activity.
理想情况下,你应该只压缩关键路径上的活动。压缩关键路径之外的活动几乎没有任何意义,因为这样做只会增加成本而不会缩短进度。同时,你不应该盲目地压缩关键路径上的所有活动。最适合压缩的活动是那些为压缩提供最佳投资回报的活动。压缩这些活动将为关键路径带来最大的进度减少。最少的额外成本。活动的持续时间也很重要,因为所有压缩技术都是破坏性的,会增加项目的风险和复杂性。最好将这些影响放在大型关键活动上,并最大限度地缩短工期。通常还建议将大型活动拆分为较小的活动——这是压缩大型活动的一个很好的副作用。
Ideally, you should compress only activities on the critical path. There is hardly ever any point in compressing activities outside the critical path because doing so will just drive the cost up without shortening the schedule. At the same time, you should not blindly compress all activities on the critical path. The best candidates for compression are activities that offer the best ROI for the compression. Compression of these activities will yield the most reduction in schedule for the least additional cost. The duration of the activity also matters, because all compression techniques are disruptive and will increase the risk and complexity of the project. It is better to incur these effects on a large critical activity and gain the most reduction in schedule. It is also generally advisable to split large activities into smaller ones—a nice side effect of compressing a large activity.
随着关键路径的压缩,关键路径也会缩短。因此,另一条路径现在可能是项目网络中最长的路径;也就是说,出现了一条新的关键路径。您应该不断评估项目网络,以检测新关键路径的出现,并压缩该路径而不是旧的关键路径。如果出现多条关键路径,您必须找到同时压缩这些路径的方法,并且压缩量相同。例如,如果一项活动或一组活动限制了所有关键路径,那么下一次压缩迭代将针对它们。
As you compress the critical path, you will shorten it. As a result, another path may now be the longest in the project network; that is, a new critical path emerges. You should constantly evaluate the project network to detect the emergence of the new critical path and compress that path instead of the old critical path. If multiple critical paths arise, you must find ways of compressing these concurrently and by identical amounts. For example, if an activity or a set of activities caps all critical paths, then the next compression iteration would target them.
您可以不断重复压缩项目,直到满足以下条件之一:
You can keep repeatedly compressing the project until one of the following conditions is met:
您已经满足了预期的期限,因此设计更昂贵、更短的项目就没那么有意义了。
You have met the desired deadline so there is so point in designing even more expensive and shorter projects.
该项目的计算成本超出了为该项目设定的预算。
The calculated cost of the project exceeds the budget set for the project.
压缩的项目网络非常复杂,任何项目经理或团队都不太可能完成它。
The compressed project network is so complex that it is unlikely any project manager or team could deliver on it.
压缩解决方案的持续时间比正常解决方案短 30% 以上(甚至 25%)。如前所述,在实践中,任何项目的压缩程度都有一个自然极限。
The duration of the compressed solution is more than 30% (or even 25%) shorter than that of the normal solution. As noted earlier, there is a natural limit to how much in practice you can compress any project.
压缩解决方案风险太大或风险略有下降,因为您已经过了最大风险点。这需要能够量化项目设计解决方案的风险(下一章讨论)。
The compressed solutions are too risky or risk is decreasing slightly because you are past the point of maximum risk. This requires the ability to quantify the risk of the project design solutions (discussed in the next chapter).
您已经没有其他想法或选项来进一步压缩项目。没有更多内容可以压缩了。
You have run out of ideas or options for compressing the project any further. There is nothing more to compress.
出现了太多关键路径或所有网络路径都变得关键。
Too many critical paths have emerged or all network paths have become critical.
您可以找到仅在关键路径之外压缩活动的方法。压缩解决方案的持续时间与前一个解决方案相同,但成本更高。您已达到项目的完全压缩点。
You can find ways of compressing activities only outside the critical path. The compressed solution is at the same duration as the previous one but is more expensive. You have reached the full compression point of the project.
这一系列压缩的项目解决方案可让您更好地对项目进行建模,并了解其在边界条件发生变化时的表现时间和成本。通常,只需要正常解决方案中剩下的两三个点就可以了解项目的行为方式。项目越复杂或越昂贵,您就越应该投入更多精力去了解项目,因为即使是微小的错误也会产生严重的影响。
The series of compressed project solutions allows you to better model the project and to understand how it behaves in the face of changes to its boundary conditions of time and cost. Often, it takes only two or three points left of the normal solution to understand how the project behaves. The more complex or expensive the project, the more you should invest in understanding the project, because even minute mistakes have drastic implications.
如第 9 章所示,每个项目总是有几种设计方案,它们提供不同的时间和成本组合。其中一些方案可能比其他方案更具侵略性或风险更大。本质上,每个项目设计方案都是三维空间中的一个点,其轴是时间、成本和风险。决策者在选择项目设计方案时应该能够考虑到风险——事实上,他们必须能够这样做。当你设计一个项目时,你必须能够量化选项的风险。
As demonstrated in Chapter 9, every project always has several design options that offer different combinations of time and cost. Some of these options will likely be more aggressive or riskier than other options. In essence, each project design option is a point in a three-dimensional space whose axes are time, cost, and risk. Decision makers should be able to take the risk into account when choosing a project design option—in fact, they must be able to do so. When you design a project, you must be able to quantify the risk of the options.
大多数人都认识到风险轴,但由于无法衡量或量化风险轴,因此往往会忽略它。这必然会导致将二维模型(时间和成本)应用于三维问题(时间、成本和风险)而导致的不良结果。本章探讨如何使用一些建模技术客观轻松地衡量风险。您将看到风险如何与时间和成本相互作用,如何降低项目风险,以及如何找到项目的最佳设计点。
Most people recognize the risk axis but tend to ignore it since they cannot measure or quantify it. This invariably leads to poor results caused by applying a two-dimensional model (time and cost) to a three-dimensional problem (time, cost, and risk). This chapter explores how to measure risk objectively and easily using a few modeling techniques. You will see how risk interacts with time and cost, how to reduce the risk of the project, and how to find the optimal design point for the project.
风险建模的最终目的是从风险、时间和成本的角度衡量项目设计方案,以评估这些方案的可行性。一般来说,风险是选择方案的最佳标准。
The ultimate objective of risk modeling is to weigh project design options in light of risk as well as time and cost so as to evaluate the feasibility of these options. In general, risk is the best criterion for choosing between options.
例如,考虑同一个项目的两个选项:第一个选项需要 12 个月和 6 名开发人员,第二个选项需要 18 个月和 4 名开发人员。如果你对这两个选项的了解仅此而已,那么大多数人会选择第一个选项,因为两个选项最终的成本相同(6 个人年),而且第一个选项交付速度更快(前提是您有足够的现金流)。现在假设您知道第一个选项的成功率只有 15%,而第二个选项的成功率有 70%。您会选择哪个选项?举一个更极端的例子,假设第二个选项需要 24 个月和 6 名开发人员,成功率同样为 70%。尽管第二个选项现在的成本是原来的两倍,所需时间也是原来的两倍,但大多数人都会直觉地选择该选项。这是一个简单的证明,人们通常根据风险而不是时间和成本来选择选项。
For example, consider two options for the same project: The first option calls for 12 months and 6 developers, and the second option calls for 18 months and 4 developers. If this is all that you know about the two options, most people will choose the first option since both options end up costing the same (6 man-years) and the first option delivers much faster (provided you have the cash flow to afford it). Now suppose you know the first option has only a 15% chance of success and the second option has a 70% chance of success. Which option would you choose? As an even more extreme example, suppose the second option calls for 24 months and 6 developers with the same 70% chance of success. Although the second option now costs twice as much and takes twice as long, most people will intuitively choose that option. This is a simple demonstration that often people choose an option based on risk, rather than based on time and cost.
正如项目有时间-成本曲线一样,项目也有时间-风险曲线。理想曲线如图10-1中的虚线所示。
Just as the project has a time–cost curve, it also has a time–risk curve. The ideal curve is shown in Figure 10-1 by the dashed line.
图 10-1理想的时间-风险曲线
Figure 10-1 Ideal time–risk curves
随着项目压缩,较短的项目设计方案会带来更高的风险,而且风险增加的速度可能是非线性的。这就是为什么图 10-1中的虚线会随着垂直风险轴向上弯曲,并随着时间的推移而向下松弛。然而,这种直观的虚线是错误的。实际上,时间风险曲线是某种逻辑函数,即图 10-1中的实线。
As you compress the project, the shorter project design solutions carry with them an increased level of risk, and the rate of increase is likely nonlinear. This is why the dashed line in Figure 10-1 curves up toward the vertical risk axis and relaxes downward with time. However, this intuitive dashed line is wrong. In reality, a time–risk curve is a logistic function of some kind, the solid line in Figure 10-1.
逻辑函数是一种更优越的模型,因为它更贴近地捕捉了复杂系统中风险的一般行为。例如,如果我要绘制由于压缩正常准备时间而导致今晚晚餐烧焦的风险,风险曲线将看起来像图 10-1中的实线。每种压缩技术(例如将烤箱温度设置得太高、将托盘放得太靠近加热元件、选择更容易烹饪但更易燃的食物、不预热烤箱等)都会增加晚餐烧焦的风险。如实线所示,由于累积压缩,在某个时刻晚餐烧焦的风险几乎达到最大值甚至趋于平稳,因为晚餐肯定会烧焦。同样,如果我决定甚至不进入厨房,那么风险就会急剧下降。如果风险由虚线决定,那么我总是有机会不烧焦晚餐,因为我总是可以通过进一步压缩来不断增加风险。
The logistic function is a superior model because it more closely captures the general behavior of risk in complex systems. For example, if I were to plot the risk of me burning dinner tonight due to compressing the normal preparation time, the risk curve would look like the solid line in Figure 10-1. Each compression technique—such as setting the oven temperature too high, placing the tray too close to the heating element, choosing easier-to-cook but more flammable food, not preheating the oven, and so on—increases the risk of burning dinner. As shown by the solid line, the risk of a burnt dinner due to the cumulative compression at some point is almost maximized and even flattens out, because dinner is certain to burn. Similarly, if I decide not to even enter the kitchen, then the risk would drop precipitously. If the risk was dictated by the dashed line, I would always have some chance of not burning dinner since I could always keep increasing the risk by compressing it further.
请注意,逻辑函数有一个临界点,在该点风险急剧增加(类似于进入厨房的决定)。相比之下,虚线则保持逐渐增加,没有明显的临界点。
Note that the logistic function has a tipping point where the risk drastically increases (the analog to the decision to enter the kitchen). The dashed line, by contrast, keeps increasing gradually and does not have a noticeable tipping point.
事实证明,即使是图 10-1中的逻辑函数仍然是理想化的时间风险曲线。实际的时间风险曲线更像图 10-2中所示的曲线。将此曲线与项目的直接成本曲线叠加,可以最好地解释此曲线形状的原因。由于项目行为是三维的,因此图 10-2依赖于第二个y轴来表示风险。
It turns out that even the logistic function in Figure 10-1 is still an idealized time–risk curve. The actual time–risk curve is more like that shown in Figure 10-2. The reason for the shape of this curve is best explained by overlaying it with the project’s direct cost curve. Since the project behavior is three-dimensional, Figure 10-2 relies on a secondary y-axis for the risk.
图 10-2实际时间-成本-风险曲线
Figure 10-2 Actual time–cost–risk curve
图 10-2中的垂直虚线表示正常解决方案的持续时间以及项目的最低直接成本解决方案。请注意,正常解决方案通常会牺牲一定数量的浮动时间以减少人员配备。浮动时间的减少表现为风险水平的提高。
The vertical dashed line in Figure 10-2 indicates the duration of the normal solution as well as the minimum direct cost solution for the project. Note that the normal solution usually trades some amount of float to reduce staffing. The reduction in float manifests in an elevated level of risk.
正常解决方案左侧是较短的压缩解决方案。压缩解决方案的风险也更大,因此风险曲线在正常解决方案左侧增加。风险上升然后趋于平稳(理想逻辑函数的情况就是如此)。但是,与理想行为不同,实际风险曲线在最短持续时间点之前达到最大化,甚至会略有下降,使其呈凹形。虽然这种行为违反直觉,但它之所以发生,是因为一般来说,较短的项目更安全,我把这种现象称为达芬奇效应。列奥纳多达芬奇在研究电线的抗拉强度时发现,较短的电线比较长的电线更坚固(这是因为缺陷的概率与电线的长度成正比)。1类似地,项目也是如此。为了说明这一点,请考虑完成 10 人年项目的两种可能方式:1 人工作 10 年或 3650 人工作 1 天。假设两者都是可行的项目(人员可用,您有时间,等等),那么 1 天项目比 10 年项目安全得多。一天内发生坏事的可能性是尚有争议,但 10 年后几乎可以肯定。我将在本章后面提供对这种行为的更量化的解释。
To the left of the normal solution are the shorter, compressed solutions. The compressed solutions are also riskier, so the risk curve increases to the left of the normal solution. The risk rises and then levels off (as is the case with the ideal logistic function). However, unlike the ideal behavior, the actual risk curve gets maximized before the point of minimum duration and even drops a bit, giving it a concave shape. While such behavior is counterintuitive, it occurs because in general, shorter projects are somewhat safer, a phenomenon I call the da Vinci effect. When investigating the tensile strength of wires, Leonardo da Vinci found that shorter wires are stronger than longer wires (it is because the probability of a defect is proportional to the length of the wire).1 In analogy, the same is true for projects. To illustrate the point, consider two possible ways of delivering a 10-man-year project: 1 person for 10 years or 3650 people for 1 day. Assuming both are viable projects (that the people are available, that you have the time, and so on), the 1-day project is much safer than the 10-year project. The likelihood of something bad happening in a single day is open for debate, but it is a near certainty with 10 years. I provide a more quantified explanation for this behavior later in this chapter.
1. William B. Parsons,《文艺复兴时期的工程师与工程》(马萨诸塞州剑桥:麻省理工学院出版社,1939 年);Jay R. Lund 和 Joseph P. Byrne,《列奥纳多达芬奇的拉伸强度试验:对工程力学发现的启示》(加州大学土木与环境工程系,戴维斯分校,2000 年 7 月)。
1. William B. Parsons, Engineers and Engineering in the Renaissance (Cambridge, MA: MIT Press, 1939); Jay R. Lund and Joseph P. Byrne, Leonardo da Vinci’s Tensile Strength Tests: Implications for the Discovery of Engineering Mechanics (Department of Civil and Environmental Engineering, University of California, Davis, July 2000).
在正常解决方案的右侧,风险至少在最初会下降。例如,为为期一年的项目多提供一周时间将降低无法履行承诺的风险。但是,如果您继续为该项目提供更多时间,帕金森定律将在某个时候生效并大幅增加风险。因此,在正常解决方案的右侧,风险曲线会下降,在大于零的某个值处最小化,然后再次开始攀升,使其呈现凸形。
To the right of the normal solution, the risk goes down, at least initially. For example, giving an extra week to a one-year project will reduce the risk of not meeting that commitment. However, if you keep giving the project more time, at some point Parkinson’s law will take effect and drastically increase the risk. So, to the right of the normal solution, the risk curve goes down, becomes minimized at some value greater than zero, and then starts climbing again, giving it a convex shape.
本章介绍了我的风险建模和量化技术。这些模型在衡量风险方面相得益彰。您通常需要多个模型来帮助您在选项之间进行选择——没有一个模型是完美的。但是,每个风险模型都应该产生可比较的结果。
This chapter presents my techniques for modeling and quantifying risk. These models complement each other in how they measure the risk. You often need more than one model to help you choose between options—no model is ever perfect. However, each of the risk models should yield comparable results.
风险值总是相对的。例如,从快速行驶的火车上跳下是有风险的。但是,如果火车即将冲下悬崖,跳下是最明智的选择。风险没有绝对值,因此你只能通过与其他替代方案进行比较来评估它。因此,你应该谈论“风险更大”的项目,而不是“风险更大”的项目。同样,没有什么是真正安全的。做任何项目的唯一安全方法就是不做它。因此,你应该谈论“更安全”的项目,而不是“安全”的项目。
Risk values are always relative. For example, jumping off a fast-moving train is risky. However, if that train is about to go over a cliff, jumping is the most sensible thing to do. Risk has no absolute value, so you can evaluate it only in comparison with other alternatives. You should therefore talk about a “riskier” project as opposed to a “risky” project. Similarly, nothing is really safe. The only safe way of doing any project is not doing it. You should therefore talk about a “safer” project rather than a “safe” project.
评估风险的关键在于能够比较选项和项目,这需要比较数字。我在创建模型时做出的第一个决定是将风险标准化为 0 到 1 的数字范围。
The whole point of evaluating risk is to be able to compare options and projects, which requires comparing numbers. The first decision I made when creating the models was to normalize risk to the numerical range of 0 to 1.
风险值为 0 并不意味着项目没有风险。风险值为 0 意味着您已将项目风险降至最低。同样,风险值为 1 并不意味着项目注定会失败,而只是意味着您已将项目风险最大化。
A risk value of 0 does not mean that the project is risk-free. A risk value of 0 means that you have minimized the risk of the project. Similarly, a risk value of 1 does not mean that the project is guaranteed to fail, but simply that you have maximized the risk of the project.
风险值也不表示成功的概率。就概率而言,值为 1 表示肯定,值为 0 表示不可能。风险值为 1 的项目仍可交付,风险值为 0 的项目仍可能失败。
The risk value also does not indicate a probability of success. With probability, a value of 1 means a certainty, and a value of 0 means an impossibility. A project with a risk value of 1 can still deliver, and a project with a risk value of 0 can still fail.
网络中各种活动的浮动时间提供了一种客观衡量项目风险的方法,前面几章在讨论风险时都提到了浮动时间。两种不同的项目设计方案的浮动时间会有所不同,因此,它们的风险也可能有很大差异。例如,考虑图 10-3中所示的两个项目设计方案。
The floats of the various activities in the network provide an objective way of measuring the risk of the project, and the previous chapters have referred to floats when discussing risk. Two different project design options will differ in their floats and, therefore, may drastically differ in their risk as well. As an example, consider the two project design options shown in Figure 10-3.
图 10-3两个项目选项
Figure 10-3 Two project options
这两个选项都是构建相同系统的有效项目设计选项。图 10-3中唯一可用的信息是两个网络的颜色编码浮点数。现在,问问自己:您更愿意参与哪个项目?我向每个看到这两个图表的人都更喜欢图 10-3右侧的绿色选项。有趣的是,从来没有人问过这两个选项的持续时间和成本有什么差异。即使我主动说绿色选项的时间长 30% 并且成本更高,这些信息也没有影响偏好。没有人选择图 10-3左侧所示的低浮点数、高压力和高风险的项目。
Both of these options are valid project design options for building the same system. The only information available in Figure 10-3 is the color-coded floats of the two networks. Now, ask yourself: With which project would you rather be involved? Everyone to whom I have shown these two charts preferred the greener option on the right-hand side of Figure 10-3. What is interesting is that no one has ever asked what the difference in duration and cost between these two options was. Even when I volunteered that the greener option was both 30% longer and more expensive, that information did not affect the preference. No one chose the low-float, high-stress, and high-risk project shown on the left in Figure 10-3.
您的项目面临多种风险。有人员配备风险(项目是否真的能达到所需的人员配备水平?)。有持续时间风险(项目是否能够按要求的持续时间完成?)。有技术风险(技术是否能够交付?)。有人为因素(团队是否具备技术能力,能否合作?)。始终存在执行风险(项目经理能否正确执行项目计划?)。
Your project faces multiple types of risk. There is staffing risk (Will the project actually gets the level of staffing it requires?). There is duration risk (Will the project be allowed the duration it requires?). There is technological risk (Will the technology be able to deliver?). There are human factors (Is the team technically competent and can they work together?). There is always an execution risk (Can the project manager execute correctly the project plan?).
这些类型的风险与你使用浮动风险评估的风险类型无关。任何项目设计解决方案总是假设组织或团队将有能力按计划的时间表和成本交付,并且项目将获得所需的时间和资源。剩下的风险类型与项目如何处理不可预见的情况有关。我把这种风险称为设计风险。
These types of risk are independent of the kind of risk you assess using floats. Any project design solution always assumes that the organization or the team will have what it takes to deliver on the planned schedule and cost and that the project will receive the required time and resources. The remaining type of risk pertains to how well the project will handle the unforeseen. I call this kind of risk design risk.
设计风险评估项目对活动进度延误和您履行承诺的能力的敏感度。因此,设计风险量化了项目的脆弱性或项目类似于纸牌屋的程度。使用浮动来衡量风险实际上是量化设计风险。
Design risk assesses the project’s sensitivity to schedule slips of activities and to your ability to meet your commitments. Design risk therefore quantifies the fragility of the project or the degree to which the project resembles a house of cards. Using floats to measure risk is actually quantifying that design risk.
项目风险衡量通常与各种解决方案的直接成本和持续时间相关。在大多数项目中,间接成本与项目风险无关。即使风险很低,间接成本也会随着项目持续时间的增加而不断增加。因此,本章仅涉及直接成本。
The project risk measurements usually correlate to the direct cost and duration of the various solutions. In most projects, the indirect cost is independent of the project risk. The indirect cost keeps mounting with the duration of the project even if the risk is very low. Therefore, this chapter refers to only direct cost.
关键性风险模型试图量化您在评估图 10-3中的选项时对风险的直观印象。对于此风险模型,您将项目中的活动分为四个风险类别,从风险最高到风险最低:
The criticality risk model attempts to quantify the intuitive impression of risk when you evaluate the options of Figure 10-3. For this risk model you classify activities in the project into four risk categories, from most to least risk:
关键活动。关键活动显然是最危险的活动,因为关键活动的任何延误都会导致进度和成本超支。
Critical activities. The critical activities are obviously the riskiest activities because any delay with a critical activity always causes schedule and cost overruns.
高风险活动。低浮动时间、近乎关键的活动也具有风险,因为任何延误都可能导致进度和成本超支。
High risk activities. Low float, near-critical activities are also risky because any delay in them is likely to cause schedule and cost overruns.
中等风险活动。具有中等浮动水平的活动具有中等风险水平,可以承受一些延迟。
Medium risk activities. Activities with a medium level of float have medium level of risk and can sustain some delays.
低风险活动。浮动时间较长的活动风险最小,甚至可以承受较大的延误,而不会导致项目脱轨。
Low risk activities. Activities with high floats are the least risky and can sustain even large delays without derailing the project.
您应该将持续时间为零的活动(如里程碑和虚拟活动)排除在分析之外,因为它们不会增加项目的风险。此外,与实际活动不同,它们只是项目网络的产物。
You should exclude activities of zero duration (such as milestones and dummies) from this analysis because they add nothing to the risk of the project. Moreover, unlike real activities, they are simply artifacts of the project network.
第 8 章介绍了如何使用颜色编码根据活动浮动时间对其进行分类。您可以使用相同的技术通过对四个风险类别进行颜色编码来评估活动的敏感性或脆弱性。使用颜色编码后,为每个活动的关键性分配权重。权重充当风险因子。当然,您可以自由选择任何表示风险差异的权重。表 10-1显示了一种可能的权重分配。
Chapter 8 showed how to use color coding to classify activities based on their float. You can use the same technique for evaluating the sensitivity or fragility of activities by color coding the four risk categories. With the color coding in place, assign a weight to the criticality of each activity. The weight acts as a risk factor. You are, of course, at liberty to choose any weights that signify the difference in risk. One possible allocation of weights is shown in Table 10-1.
表 10-1关键性风险权重
Table 10-1 Criticality risk weights
活动颜色 Activity Color |
重量 Weight |
|---|---|
黑色(严重) Black (critical) |
4 4 |
红色(高风险) Red (high risk) |
3 3 |
黄色(中等风险) Yellow (medium risk) |
2 2 |
绿色(低风险) Green (low risk) |
1 1 |
临界风险公式为:
The criticality risk formula is:
在哪里:
where:
WC是黑色的重量,关键的活动。
WC is the weight of the black, critical activities.
WR是红色、低浮动活动的权重。
WR is the weight of red, low-float activities.
WY是黄色、中等浮动活动的重量。
WY is the weight of yellow, medium-float activities.
WG是绿色、高浮动活动的重量。
WG is the weight of green, high-float activities.
NC是黑色、关键活动的数量。
NC is the number of the black, critical activities.
NR是红色、低浮动活动的数量。
NR is the number of red, low-float activities.
NY是黄色、中等浮动活动的数量。
NY is the number of yellow, medium-float activities.
NG是绿色、高浮动活动的数量。
NG is the number of green, high-float activities.
N项目中的活动数为()。N = NC + NR + NY + NG
N is the number of activities in the project (N = NC + NR + NY + NG).
代入表 10-1中的权重,关键风险公式为:
Substituting the weights from Table 10-1, the criticality risk formula is:
将关键性风险公式应用于图 10-4中的网络可得出:
Applying the criticality risk formula to the network in Figure 10-4 yields:
图片 10-4风险计算示例网络
Figure 10-4 Sample network for risk calculation
关键性风险的最大值为 1.0;当网络中的所有活动都至关重要时,就会发生这种情况。在这样的网络中,和为零,且等于:NR, NYNGNCN
The maximum value of the criticality risk is 1.0; it occurs when all activities in the network are critical. In such a network, NR, NY, and NG are zero, and NC equals N:
临界风险的最小值超过;当网络中的所有活动都为绿色时,就会发生这种情况。在这样的网络中,、和为零,且等于:WGWCNCNRNYNGN
The minimum value of the criticality risk is WG over WC; it occurs when all activities in the network are green. In such a network, NC, NR, and NY are zero, and NG equals N:
使用表 10-1中的权重,风险的最小值为 0.25。因此,关键性风险永远不会为零:只要权重本身大于零,这样的加权平均值的最小值就永远大于零。这并不一定是坏事,因为项目风险永远不会为零。该公式意味着风险值的最低范围太低而无法实现,这是合理的,因为任何值得做的事情都需要风险。
Using the weights from Table 10-1, the minimum value of risk is 0.25. The criticality risk, therefore, can never be zero: A weighted average such as this will always have a minimum value greater than zero as long as the weights themselves are greater than zero. This is not necessarily a bad thing, as the project risk should never be zero. The formula implies the lowest range of risk values is too low to achieve, which is reasonable since anything worth doing requires risk.
只要您可以合理选择权重,关键风险模型就可能有效。例如,权重集 [21, 22, 23, 24] 是一个糟糕的选择,因为 21 仅比 24 小 14%;因此,该集合没有强调绿色活动与关键活动的风险。此外,使用这些权重的最小风险 ( ) 为 0.88,这显然太高了。我发现权重集 [1, 2, 3, 4] 与任何其他合理选择一样好。Wg /Wc
As long as you can rationalize your choice of weights, the criticality risk model will likely work. For example, the set of weights [21, 22, 23, 24] is a poor choice because 21 is only 14% smaller than 24; thus, this set does not emphasize the risk of the green versus the critical activities. Furthermore, the minimum risk using these weights (Wg /Wc) is 0.88, which is obviously too high. I find the weights set [1, 2, 3, 4] to be as good as any other sensible choice.
关键性风险模型通常需要一些定制和判断。首先,如第 8 章所述,各种颜色的范围(红色、黄色和绿色活动的标准)必须适合项目的持续时间。其次,您应该考虑将浮动时间非常低或接近关键的活动(例如浮动时间为 1 天的活动)定义为关键,因为这些活动基本上具有与关键活动相同的风险。第三,即使某些活动的浮动时间不是接近关键,您也应该检查活动所在的链并进行相应调整。例如,如果您有一个包含许多活动的长达一年的活动链,并且该链只有 10 天的浮动时间,则应该将链上的每个活动归类为关键活动以进行风险计算。该链上一个活动的失误将消耗所有浮动时间,从而将所有下游活动变成关键活动。
The criticality risk model often requires some customization and judgment calls. First, as mentioned in Chapter 8, the ranges of the various colors (the criteria for red, yellow, and green activities) must be appropriate for the duration of your project. Second, you should consider defining very-low-float or near-critical activities (such as those with 1 day of float) as critical because these basically have the same risk as critical activities. Third, even if some activities’ floats are not near-critical, you should examine the chain on which the activities reside and adjust it accordingly. For example, if you have a year-long chain of many activities and the chain has only 10 days of float, you should classify each activity on the chain as a critical activity for risk calculation. A slip with one activity up that chain will consume all float, turning all downstream activities into critical activities.
斐波那契数列是一个数字序列,其中数列中的每个项目都等于前两个项目的总和,但前两个值定义为 1。
The Fibonacci series is a sequence of numbers in which every item in the series equals the sum of the previous two, with the exception that the first two values are defined as 1.
这个递归定义产生了 1、1、2、3、5、8、13、... 的序列。
This recursive definition yields the series of 1, 1, 2, 3, 5, 8, 13, ….
两个(足够大的)连续斐波那契数之间的比率是一个无理数,称为 phi(希腊字母φ),其值为 1.618......,该级数表示为:
The ratio between two (sufficiently large) consecutive Fibonacci numbers is an irrational number known as phi (the Greek letter φ), whose value is 1.618..., and the series is expressed as:
Fib i = φ * Fib i-1
Fibi = φ *Fibi-1
自古以来,φ就被称为黄金比例。自然界和人类企业都观察到了这一比例。基于黄金比例的两个著名(且截然不同)例子是无脊椎鹦鹉螺壳的螺旋方式和市场回溯其先前价格水平的方式。
Since ancient times, φ has been known as the golden ratio. It is observed throughout nature and human enterprises alike. Two famous (and quite disparate) examples based on the golden ratio are the way the invertebrate nautilus’s shell spirals and the way markets retrace their former price levels.
请注意,表 10-1中的权重与斐波那契数列的起始值类似。作为表 10-1的替代,你可以从斐波那契数列中选择任意四个连续成员(例如 [89, 144, 233, 377])作为权重。无论你如何选择,当你使用它们来评估网络时图 10-4 中,由于权重保持φ的比例,风险始终为 0.64 。如果是绿色活动的权重,则其他权重为:WG
Notice that the weights in Table 10-1 are similar to the beginning values of the Fibonacci series. As an alternative to Table 10-1, you can choose any four consecutive members from the Fibonacci series (such as [89, 144, 233, 377]) as weights. Regardless of your choice, when you use them to evaluate the network in Figure 10-4, the risk will always be 0.64 because the weights maintain the ratio of φ. If WG is the weight of the green activities, the other weights are:
临界风险公式可以写成:
and the criticality risk formula can be written as:
由于出现在分子和分母的所有元素中,因此等式可以简化:WG
Since WG appears in all elements of the numerator and the denominator, the equation can be simplified:
近似值φ,该公式简化为:
Approximating the value of φ, the formula is reduced to:
我将这种风险模型称为斐波那契风险模型。
I call this risk model the Fibonacci risk model.
在全关键网络中,斐波那契风险公式可以达到的最大值是 1.0。它可以达到的最小值是 0.24(1/4.24),略小于最低关键性风险模型值 0.25(当使用集合 [1, 2, 3, 4] 作为权重时)。这支持了风险具有约 0.25 的自然下限的观点。
The maximum value that the Fibonacci risk formula can reach is 1.0 in an all-critical network. The minimum value that it can reach is 0.24 (1/4.24), slightly less than the minimum criticality risk model value of 0.25 (when using the set [1, 2, 3, 4] for weights). This supports the notion that risk has a natural lower limit of about 0.25.
关键性风险模型使用广泛的风险类别。例如,如果您将浮动时间大于 25 天定义为绿色,那么两个活动(一个浮动时间 30 天,另一个浮动时间 60 天)将被放置在同一个绿色箱中,并具有相同的风险值。为了更好地说明每个活动的风险贡献,我创建了活动风险模型。该模型比关键性风险模型更加离散。
The criticality risk model uses broad risk categories. For example, if you define float greater than 25 days as green, then two activities—one with 30 days of float and the other with 60 days of float—will be placed in the same green bin and will have the same risk value. To better account for the risk contribution of each individual activity, I created the activity risk model. This model is a far more discrete than the criticality risk model.
活动风险公式为:
The activity risk formula is:
在哪里:
where:
Fi是活动浮动时间i。
Fi is the float of activity i.
N是项目中的活动数。
N is the number of activities in the project.
M是项目中任何活动的最大浮动时间或 Max( )。F1, F2, …, FN
M is the maximum float of any activity in the project or Max(F1, F2, …, FN).
与关键性风险一样,您应该从此分析中排除零持续时间的活动(里程碑和虚拟活动)。
As with the criticality risk, you should exclude activities of zero duration (milestones and dummies) from this analysis.
将活动风险公式应用于图 10-4中的网络可得出:
Applying the activity risk formula to the network in Figure 10-4 yields:
当所有活动都是关键活动时,活动风险模型是不确定的。然而,在极限情况下,给定一个大型网络(大型N),其中仅包含一个浮动时间为 的非关键活动M,该模型趋近于 1.0:
The activity risk model is undefined when all activities are critical. However, at the limit, given a large network (large N) that includes only one noncritical activity with float M, the model approaches 1.0:
当网络中的所有活动具有相同的浮动水平时,活动风险的最小值为 0 M:
The minimum value of the activity risk is 0 when all activities in the network have the same level of float, M:
虽然活动风险在理论上可以达到零,但在实践中你不太可能遇到这样的项目,因为所有项目总是存在一些非零的风险。
While activity risk can in theory reach zero, in practice it is unlikely that you will encounter such a project because all projects always have some non-zero amount of risk.
只有当项目的浮动时间在网络中最小浮动时间和最大浮动时间之间大致均匀分布时,活动风险模型才能很好地发挥作用。明显高于所有其他浮动时间的异常浮动值将使计算产生偏差,从而产生错误的高风险值。例如,考虑一个为期一年的项目,该项目只有一项为期一周的活动,可以在项目开始和结束之间的任何时间进行。这样的活动将有几乎一年的浮动时间,如图10-5中的网络所示。
The activity risk model works well only when the floats of the projects are more or less uniformly spread between the smallest float and the largest float in the network. An outlier float value that is significantly higher than all other floats will skew the calculation, producing an incorrectly high-risk value. For example, consider a one-year project that has a single week-long activity that can take place anywhere between the beginning and the end of the project. Such an activity will have almost a year’s worth of float, as illustrated in the network in Figure 10-5.
图片 10-5具有异常高浮动活动的网络
Figure 10-5 Network with outlier high float activity
图 10-5显示了关键路径(粗体黑色)和下方带有一定颜色标记的浮动级别的许多活动()。关键路径上方显示的活动本身很短,但浮动量却很大。FiM
Figure 10-5 shows the critical path (bold black) and many activities with some color-coded level of float (Fi) below. The activity shown above the critical path itself is short but has an enormous amount of float M.
由于M远大于任何其他的,活动风险公式得出的数字接近于 1:Fi
Since M is much larger than any other Fi, the activity risk formula yields a number approaching 1:
下一章将演示这种情况,并提供一种检测和调整浮动异常值的简单有效的方法。
The next chapter demonstrates this situation and provides an easy and effective way of detecting and adjusting the float outliers.
当项目没有太多活动,且非关键活动的浮动值都相似甚至相同时,活动风险也会产生错误的低活动风险值。但是,除了这些罕见的、有点做作的例子外,活动风险模型可以正确地衡量风险。
The activity risk also produces an incorrectly low activity risk value when the project does not have many activities and the floats of the noncritical activities are all of similar or even have identical value. However, except for these rare, somewhat contrived examples, the activity risk model measures the risk correctly.
对于规模相当大的实际项目,关键性风险模型和活动风险模型得出的结果非常相似。每种模型都有优点和缺点。一般来说,关键性风险更能反映人类的直觉,而活动风险则更能适应各个活动之间的差异。关键性风险建模通常需要校准或判断,但与浮动分布的均匀程度无关。活动风险对较大的异常浮动的存在很敏感,但很容易计算,不需要太多校准。您甚至可以自动调整浮动异常值。
For decent-size real-life projects, the criticality and activity risk models yield very similar results. Each model has pros and cons. In general, criticality risk reflects human intuition better, while activity risk is more attuned to the differences between individual activities. Criticality risk modeling often requires calibration or judgment calls, but it is indifferent to how uniformly the floats are spread. Activity risk is sensitive to the presence of large outlier floats, but it is easy to calculate and does not require much calibration. You can even automate the adjustment of float outliers.
如前所述,风险在高压缩率下略有降低,这反映了直观的观察结果:较短的项目更安全。量化风险模型为这种现象提供了解释。高度压缩软件项目的唯一实用方法是引入并行工作。第 9 章列出了参与并行工作的几种想法,例如拆分活动并与其他活动并行执行依赖性较低的阶段或引入支持并行工作的附加活动。图 10-6以定性方式显示了这种效果。
As discussed previously, risk decreases slightly with high compression, reflecting the intuitive observation that shorter projects are safer. Quantified risk modeling offers an explanation for this phenomenon. The only practical way of highly compressing a software project is to introduce parallel wor0k. Chapter 9 listed several ideas for engaging in parallel work, such as splitting activities and performing the less-dependent phases in parallel to other activities or introducing additional activities that enable the parallel work. Figure 10-6 shows this effect in a qualitative manner.
图10-6高压缩使网络更加并行
Figure 10-6 High compression makes the network more parallel
图 10-6描绘了两个网络,底部图是顶部图的压缩版本。压缩解决方案的关键活动更少,关键路径更短,并行的非关键活动更多。在衡量此类压缩项目的风险时,更多具有浮动时间的活动和更少的关键活动的存在将降低关键性和活动风险模型产生的风险值。
Figure 10-6 depicts two networks, with the bottom diagram being the compressed version of the top diagram. The compressed solution has fewer critical activities, a shorter critical path, and more noncritical activities in parallel. When measuring the risk of such compressed projects, the presence of more of activities with float and fewer critical activities will decrease the risk value produced by both the criticality and activity risk models.
虽然高度并行的项目的设计风险可能低于压缩程度较低的解决方案的设计风险,但由于额外的依赖关系以及需要安排和跟踪的活动数量增加,此类项目的执行更具挑战性。此类项目将具有严格的调度约束,并且需要更大的团队。本质上,高度压缩的项目已将设计风险转化为执行风险。您应该衡量执行风险和设计风险。网络的复杂性是预期执行风险的良好代理。第 12 章讨论了如何量化执行复杂性。
While the design risk of a highly parallel project may be lower than the design risk of a less compressed solution, such a project is more challenging to execute because of the additional dependencies and the increased number of activities that need to be scheduled and tracked. Such a project will have demanding scheduling constraints and require a larger team. In essence, a highly compressed project has converted design risk into execution risk. You should measure the execution risk as well as the design risk. A good proxy for the expected execution risk is the complexity of the network. Chapter 12 discusses how to quantify execution complexity.
虽然压缩项目可能会增加风险,但反之亦然(在一定程度上):通过放松项目,你可以降低其风险。我称这种技术为风险减压。你刻意设计项目以便稍后交付通过在关键路径上引入浮动来缩短工期。风险减压是降低项目脆弱性(即对不可预见因素的敏感度)的最佳方法。
While compressing the project is likely to increase the risk, the opposite is also true (up to a point): By relaxing the project, you can decrease its risk. I call this technique risk decompression. You deliberately design the project for a later delivery date by introducing float along the critical path. Risk decompression is the best way to reduce the project’s fragility, its sensitivity to the unforeseen.
当可用的解决方案风险太大时,您应该放松项目的压力。放松项目压力的其他原因包括:由于过去业绩不佳而对当前前景的担忧、面临太多未知数,或者环境动荡,优先级和资源不断变化。
You should decompress the project when the available solutions are too risky. Other reasons for decompressing the project include concerns about the present prospects based on a poor past track record, facing too many unknowns, or a volatile environment that keeps changing its priorities and resources.
正如第 7 章所讨论的,在尝试降低风险时,一个典型的错误是夸大估计。这实际上会使情况变得更糟,并降低成功的可能性。减压的整个目的是保持原始估计不变,而是增加所有网络路径上的浮动时间。
As discussed in Chapter 7, a classic mistake when trying to reduce risk is to pad estimations. This will actually make matters worse and decrease the probability of success. The whole point of decompression is to keep the original estimations unchanged and instead increase the float along all network paths.
同时,您不应过度减压。使用风险模型,您可以衡量减压的效果,并在达到减压目标时停止(本节后面将讨论)。当所有活动都有高浮动时,过度减压的收益会递减。超过此点的任何额外减压都不会降低设计风险,但会增加总体高估风险并浪费时间。
At the same time, you should not over-decompress. Using the risk models, you can measure the effect of the decompression and stop when you reach your decompression target (discussed later in this section). Excessive decompression will have diminishing returns when all activities have high float. Any additional decompression beyond this point will not reduce the design risk, but will increase the overall overestimation risk and waste time.
您可以对任何项目设计方案进行减压,尽管您通常只对正常方案进行减压。减压会使项目稍微进入不经济区域(见图10-2),从而增加项目的时间和成本。当您对项目设计方案进行减压时,您仍然使用原始人员进行设计。不要试图消耗额外的减压浮动时间并减少人员——这首先违背了风险减压的目的。
You can decompress any project design solution, although you typically decompress only the normal solution. Decompression pushes the project a bit into the uneconomical zone (see Figure 10-2), increasing the project’s time and cost. When you decompress a project design solution, you still design it with the original staffing. Do not be tempted to consume the additional decompression float and reduce the staff—that defeats the purpose of risk decompression in the first place.
减轻项目压力的一个直接方法是将项目中的最后一项活动或最后一项事件推后到时间轴的下方。这会为网络中的所有先前活动增加浮动时间。在图 10-4所示的网络中,将活动压力减轻1610 天会导致临界风险为 0.47,活动风险为 0.52。将活动压力减轻1630 天会导致临界风险为 0.3,活动风险为 0.36。
A straightforward way of decompressing the project is to push the last activity or the last event in the project down the timeline. This adds float to all prior activities in the network. In the case of the network depicted in Figure 10-4, decompressing activity 16 by 10 days results in a criticality risk of 0.47 and an activity risk of 0.52. Decompressing activity 16 by 30 days results in a criticality risk of 0.3 and an activity risk of 0.36.
更复杂的技术是同时对关键路径上的一两个关键活动进行减压,如图10-48中的活动。一般来说,在网络中减压越深,需要减压的就越多,因为上游活动中的任何失误都可能消耗下游活动的浮动时间。在网络中减压越早,您引入的所有浮动时间被消耗的可能性就越小。
A more sophisticated technique is to also decompress one or two key activities along the critical path, such as activity 8 in Figure 10-4. In general, the further down the network you decompress, the more you need to decompress because any slip in an upstream activity can consume the float of the downstream activities. The earlier in the network you decompress, the less likely it is that all of the float you have introduced will be consumed.
在对项目进行减压时,你应该努力减压,直到风险降至 0.5。图 10-7使用渐近线为 1 和 0 的逻辑函数演示了理想风险曲线上的这一点。
When decompressing a project, you should strive to decompress until the risk drops to 0.5. Figure 10-7 demonstrates this point on the ideal risk curve using a logistic function with asymptotes at 1 and 0.
图 10-7理想风险曲线上的减压目标
Figure 10-7 The decompression target on the ideal risk curve
当项目持续时间很短时,风险值几乎为 1.0,风险最大化。此时风险曲线几乎是平的。最初,增加项目时间并不能显著降低风险。随着时间的流逝,风险曲线在某个点开始下降,并且给项目的时间越多,曲线就越陡峭。但是,随着时间的流逝,风险曲线开始趋于平稳,增加时间所能带来的风险降低就越少。风险曲线最陡峭的点是减压回报最好的点,也就是说,用最少的减压量获得最大的风险降低。这一点定义了风险减压目标。由于图 10-7中的逻辑函数是 0 和 1 之间的对称曲线,因此临界点恰好在风险值 0.5 处。
When the project has a very short duration, the value of risk is almost 1.0, and the risk is maximized. At that point the risk curve is almost flat. Initially, adding time to the project does not reduce the risk by much. With more time, at some point the risk curve starts descending, and the more time you give the project, the steeper the curve gets. However, with even more time, the risk curve starts leveling off, offering less reduction in risk for additional time. The point at which the risk curve is the steepest is the point with the best return on the decompression—that is, the most reduction in risk for the least amount of decompression. This point defines the risk decompression target. Since the logistic function in Figure 10-7 is a symmetric curve between 0 and 1, the tipping point is at a risk value of exactly 0.5.
要确定减压目标与成本的关系,请将实际风险曲线与直接成本曲线进行比较(图 10-8)。实际风险曲线的范围比理想风险曲线要窄,并且永远不会接近 0 或 1,尽管它的行为类似于最大值和最小值之间的逻辑函数。正如本章开头所讨论的那样,风险曲线最陡峭的点(凹面变为凸面)是直接成本最低的地方,这与减压目标相吻合(图 10-8)。
To determine how the decompression target relates to cost, compare the actual risk curve with the direct cost curve (Figure 10-8). The actual risk curve is confined to a narrower range than the ideal risk curve and never approaches either 0 or 1, although it behaves similarly to a logistic function between its maximum and minimum values. As discussed at the beginning of this chapter, the steepest point of the risk curve (where concave becomes convex) is at minimum direct cost, which coincides with the decompression target (Figure 10-8).
图片 10-8最小直接成本与风险为 0.5 的情况一致
Figure 10-8 Minimum direct cost coincides with risk at 0.5
由于风险持续下降至 0.5 的右侧,因此您可以将 0.5 视为最低减压目标。同样,您应该监控风险曲线的行为,不要过度减压。
Since the risk keeps descending to the right of 0.5, you can think of 0.5 as a minimum decompression target. Again, you should monitor the behavior of the risk curve and not over-decompress.
如果项目的最低直接成本点也是风险方面的最佳点,那么它就是项目的最佳设计点,以最佳风险提供最低的直接成本。这个点既不太危险也不太安全,可以从增加项目时间中获得最大的收益。
If the minimum direct cost point of the project is also the best point risk-wise, this makes it the optimal design point for the project, offering the least direct cost at the best risk. This point is neither too risky nor too safe, benefiting as much as possible from adding time to the project.
在本章的最后,我们来介绍几个容易记住的指标和经验法则。与每个设计指标一样,您应该将它们用作指导方针。违反这些指标是一个危险信号,您应该始终调查其原因。
To end this chapter, here are a few easy-to-remember metrics and rules of thumb. As is the case with every design metric, you should use them as guidelines. A violation of the metrics is a red flag, and you should always investigate its cause.
将风险保持在 0.3 和 0.75 之间。您的项目绝不应具有极端风险值。显然,0 或 1.0 的风险值是没有意义的。风险不应太低:由于关键性风险模型不能低于 0.25,您可以将 0.25 的下限四舍五入为 0.3 作为任何项目的下限。在压缩项目时,早在风险达到 1.0(完全关键的项目)之前,您就应该停止压缩。即使 0.9 或 0.85 的风险值仍然很高。如果不允许 0 到 0.25 的最低四分之一,那么为了对称,您应该避免 0.75 和 1.0 之间的最高四分之一风险值。
Keep risk between 0.3 and 0.75. Your project should never have extreme risk values. Obviously, a risk value of 0 or 1.0 is nonsensical. The risk should not be too low: Since the criticality risk model cannot go below 0.25, you can round the lower possible limit of 0.25 up to 0.3 as the lower bound for any project. When compressing the project, long before the risk gets to 1.0 (a fully critical project), you should stop compressing. Even a risk value of 0.9 or 0.85 is still high. If the bottom quarter of 0 to 0.25 is disallowed, then for symmetry’s sake you should avoid the top quarter of risk values between 0.75 and 1.0.
减压至 0.5。理想的减压目标是风险为 0.5,因为它瞄准的是风险曲线的临界点。
Decompress to 0.5. The ideal decompression target is a risk of 0.5, as it targets the tipping point in the risk curve.
不要过度减压。正如所讨论的,超过减压目标的减压会产生令人厌烦的回报,而过度减压会增加风险。
Do not over-decompress. As discussed, decompression beyond the decompression target has dismissing returns, and over-decompression increases the risk.
保持正常解决方案低于 0.7。虽然风险升高可能是压缩解决方案的代价,但对于正常解决方案而言,这是不可取的。回到对称性论证,如果 0.3 的风险是所有解决方案的下限,那么 0.7 的风险就是正常解决方案的上限。您应该始终对高风险的正常解决方案进行解压缩。
Keep normal solutions under 0.7. While elevated risk may be the price you pay for a compressed solution, it is inadvisable for a normal solution. Returning to the symmetry argument, if risk of 0.3 is the lower bound for all solutions, then risk of 0.7 is the upper bound for a normal solution. You should always decompress high-risk normal solutions.
您应该将风险建模和风险指标作为项目设计的一部分。不断衡量风险,了解您目前的情况和未来发展方向。
You should make both risk modeling and risk metrics part of your project design. Constantly measure the risk to see where you are and where you are heading.
许多项目设计新手面临的困难不是具体的设计技术和概念,而是设计过程的端到端流程。也很容易陷入细节而忘记设计工作的目标。如果没有经验,当你遇到第一个障碍或不按规定行事的情况时,你可能会不知所措。试图涵盖所有可能的意外情况和响应是不切实际的。相反,掌握项目设计所涉及的思维过程会更好。
The difficulty facing many project design novices is not the specific design techniques and concepts, but rather the end-to-end flow of the design process. It is also easy to get mired in the details and to lose sight of the objective of the design effort. Without experience, you may be stumped when you encounter the first snag or situation that does not behave as prescribed. It is impractical to try to cover all possible contingencies and responses. Instead, it is better to master the thought process involved in project design.
本章通过对设计工作的全面演练,展示了思维过程和思维方式。重点是对步骤和迭代的系统检查。您将看到观察和经验法则、如何在项目设计选项之间交替、如何找到有意义的东西以及如何评估权衡。随着本章的发展,它展示了前几章的想法以及通过结合项目设计技术获得的协同效应。它还涵盖了项目设计的其他方面,例如规划假设、复杂性降低、人员配备和调度、适应约束、压缩以及风险和规划。因此,本章的目标是教授项目设计流程和技术,而不是提供真实示例。
This chapter demonstrates the thought process and the mindset via a comprehensive walkthrough of the design effort. The emphasis is on the systematic examination of the steps and iterations. You will see observations and rules of thumb, how to alternate between project design options, how to home in on what makes sense, and how to evaluate tradeoffs. As this chapter evolves, it demonstrates ideas from the previous chapters as well as the synergy gained by combining project design techniques. It also covers additional aspects of project design such as planning assumptions, complexity reduction, staffing and scheduling, accommodating constraints, compression, and risk and planning. As such, the objective of this chapter is teaching project design flow and techniques, as opposed to providing a real-life example.
您的任务是设计一个项目来构建一个典型的业务系统。该系统是使用方法设计的,但这一事实在本章中并不重要。一般来说,项目设计工作的投入应包括以下要素:
Your mission is to design a project to build a typical business system. This system was designed using The Method, but that fact is immaterial in this chapter. In general, the input to the project design effort should include the following ingredients:
静态架构。您可以使用静态架构来创建编码活动的初始列表。
The static architecture. You use the static architecture to create the initial list of coding activities.
调用链或序列图。通过检查用例以及它们在系统中的传播方式,可以生成调用链或序列图。这些提供了结构活动依赖关系的粗略描述。
Call chains or sequence diagrams. You produce the call chains or sequence diagrams by examining the use cases and how they propagate through the system. These provide the rough cut of structural activity dependencies.
活动列表。列出所有活动,包括编码和非编码活动。
List of activities. You list all activities, coding and noncoding alike.
持续时间估算。对于每项活动,您要准确估算所涉及的持续时间(和资源)(或与他人合作完成此操作)。
Duration estimation. For each activity, you accurately estimate the duration (and resources) involved (or work with others to do so).
规划假设。您可以记录关于人员配备、可用性、启动时间、技术、质量等的假设。您通常会有几组这样的假设,每组假设都会产生不同的项目设计解决方案。
Planning assumptions. You capture the assumptions you have about staffing, availability, ramp-up time, technology, quality, and so on. You typically will have several such sets of assumptions, with each set resulting in a different project design solution.
一些约束。写下所有明确已知的约束。还应包括可能或可能的约束,并据此制定计划。您将在本章中看到多个处理约束的示例。
Some constraints. You write down all the explicitly known constraints. You should also include possible or likely constraints, and plan accordingly. You will see multiple examples in this chapter for handling constraints.
图 11-1显示了系统的静态架构。可以看出,系统规模相当有限。它包括两个客户端、五个业务逻辑组件、三个资源访问组件、两个资源和三个实用程序。
Figure 11-1 shows the static architecture of the system. As you can tell, the system is fairly limited in size. It includes two Clients, five business logic components, three ResourceAccess components, two Resources, and three Utilities.
图11-1系统静态架构
Figure 11-1 The system static architecture
虽然图 11-1中的系统灵感来自真实系统,但本章并不讨论此特定架构的优点。在设计项目时,应避免将项目设计工作转变为系统设计评审。即使是糟糕的架构也应该有充分的项目设计,以最大限度地提高履行承诺的机会。
While the system in Figure 11-1 was inspired by a real system, the merits of this particular architecture are irrelevant in this chapter. When designing the project, you should avoid turning the project design effort into a system design review. Even poor architectures should have adequate project design to maximize the chance of meeting your commitments.
该系统只有两个核心用例和两个调用链。第一个调用链如图 11-2所示,以发布事件结束。第二个调用链如图 11-3所示,描述了订阅者对该事件的处理。
The system has only two core use cases and two call chains. The first call chain, shown in Figure 11-2, concludes with publishing an event. The second call chain in Figure 11-3, depicts the processing of that event by the subscribers.
图11-2调用链1
Figure 11-2 Call chain 1
图11-3调用链2
Figure 11-3 Call chain 2
您应该检查调用链,并绘制出架构中组件之间依赖关系的初稿。您从连接组件的所有箭头开始,无论传输或连接如何,并将每个箭头视为依赖关系。您应该只考虑一次依赖关系。但是,调用链图通常不会显示全貌,因为它们通常会忽略重复的隐式依赖关系。在这种情况下,架构的所有组件(资源除外)都依赖于Logging,而客户端和管理器又依赖于Security组件。有了这些附加信息,您可以绘制如图 11-4所示的依赖关系图。
You should examine the call chains, and lay out a first draft of the dependencies between components in the architecture. You start with all the arrows connecting components, regardless of transport or connectivity, and consider each as a dependency. You should account for any dependency exactly once. However, typically the call chain diagrams do not show the full picture because they often omit repeated implicit dependencies. In this case, all components of the architecture (except the Resources) depend on Logging, and the Clients and Managers depend on the Security component. Armed with that additional information, you can draw the dependency chart shown in Figure 11-4.
图 11-4初始依赖关系图
Figure 11-4 Initial dependency chart
如您所见,即使是对于只有两个用例的简单系统,依赖关系图也很混乱且难以分析。 您可以利用一种简单的技术来降低复杂性,即消除重复的继承依赖项的依赖项。 继承的依赖项是由于传递依赖项1 - 活动通过依赖于其他活动而隐式继承的依赖项。 在图 11-4中,Client A依赖于Manager A和Security;Manager A也依赖于。 这意味着您可以省略和Security之间的依赖关系。 使用继承的依赖项,您可以将图 11-4简化为图 11-5。Client ASecurity
As you can see, even with a simple system having only two use cases, the dependency chart is cluttered and hard to analyze. A simple technique you can leverage to reduce the complexity is to eliminate dependencies that duplicate inherited dependencies. Inherited dependencies are due to transitive dependencies1—those dependencies that an activity implicitly inherits by depending on other activities. In Figure 11-4, Client A depends on Manager A and Security; Manager A also depends on Security. This means you can omit the dependency between Client A and Security. Using inherited dependencies, you can reduce Figure 11-4 to Figure 11-5.
图 11-5合并继承的依赖关系后的依赖关系图
Figure 11-5 Dependency chart after consolidating inherited dependencies
虽然图 11-5肯定比图 11-4简单,但它仍然不够充分,因为它本质上是高度结构化的,只显示了编码活动。您必须编制一份项目所有活动的综合清单。在这种情况下,非编码活动列表包括需求、架构(如技术验证或演示服务)、项目设计、测试计划、测试工具和系统测试方面的额外工作。表 11-1列出了项目中的所有活动、它们的持续时间估计以及它们对先前活动的依赖关系。
While Figure 11-5 is certainly simpler than Figure 11-4, it is still inadequate because it is highly structural in nature, showing only the coding activities. You must compile a comprehensive list of all activities in the project. In this case, the list of noncoding activities includes additional work on requirements, architecture (such as technology verification or a demo service), project design, test plan, test harness, and system testing. Table 11-1 lists all activities in the project, their duration estimation, and their dependencies on preceding activities.
表 11-1活动、持续时间和依赖关系
Table 11-1 Activities, duration, and dependencies
ID ID |
活动 Activity |
持续时间(天) Duration (days) |
取决于 Depends On |
|---|---|---|---|
1 1 |
要求 Requirements |
15 15 |
|
2 2 |
建筑学 Architecture |
20 20 |
1 1 |
3 3 |
项目设计 Project Design |
20 20 |
2 2 |
4 4 |
测试计划 Test Plan |
三十 30 |
3 3 |
5 5 |
测试工具 Test Harness |
三十五 35 |
4 4 |
6 6 |
日志记录 Logging |
15 15 |
3 3 |
7 7 |
安全 Security |
20 20 |
3 3 |
8 8 |
发布/订阅 Pub/Sub |
5 5 |
3 3 |
9 9 |
资源 A Resource A |
20 20 |
3 3 |
10 10 |
资源 B Resource B |
15 15 |
3 3 |
11 11 |
资源访问A ResourceAccess A |
10 10 |
6,9 6,9 |
12 12 |
资源访问B ResourceAccess B |
5 5 |
6,10 6,10 |
十三 13 |
资源访问 C ResourceAccess C |
15 15 |
6 6 |
14 14 |
发动机 A Engine A |
20 20 |
12,13 12,13 |
15 15 |
引擎 B Engine B |
二十五 25 |
12,13 12,13 |
16 16 |
引擎 C Engine C |
15 15 |
6 6 |
17 17 |
经理A Manager A |
20 20 |
7,8,11,14,15 7,8,11,14,15 |
18 18 |
经理B Manager B |
二十五 25 |
7,8,15,16 7,8,15,16 |
19 19 |
客户端应用程序1 Client App1 |
二十五 25 |
17,18 17,18 |
20 20 |
客户端应用2 Client App2 |
三十五 35 |
17 17 |
21 21 |
系统测试 System Testing |
三十 30 |
5,19,20 5,19,20 |
有了活动和依赖关系列表,您就可以将项目网络绘制为箭头图。图 11-6显示了初始网络图。该图中的数字对应于表 11-1中的活动 ID 。粗线和数字表示关键路径。
With the list of activities and dependencies at hand, you can draw the project network as an arrow diagram. Figure 11-6 shows the initial network diagram. The numbers in this figure correspond to the activity IDs in Table 11-1. The bold lines and numbers indicate the critical path.
图 11-6初始网络图
Figure 11-6 Initial network diagram
如第 8 章所定义,里程碑是项目中的一个事件,表示项目重要部分的完成,包括主要的集成成果。即使在项目设计的早期阶段,您也应该将事件完成Project Design(活动 3)指定为 SDP 审查里程碑,M0。在这种情况下,M0是项目前端(模糊前端的简称)的完成,包括需求、架构和项目设计。这使得 SDP 审查成为计划的明确部分。您可以在关键路径上或关键路径外设置里程碑,它们可以是公开的,也可以是私有的。公开里程碑向管理层和客户展示了进度,而私有里程碑则是团队的内部障碍。如果里程碑在关键路径之外,最好将其保密,因为它可能会由于上游某处的延迟而移动。在关键路径上,里程碑可以是私有的,也可以是公开的,它们与在时间和成本方面履行项目承诺直接相关。里程碑的另一个用途是强制依赖,即使调用链没有指定这样的依赖关系。 SDP 审查就是这样一个里程碑——在 SDP 审查之前,任何建设活动都不应开始。这种强制依赖里程碑也简化了网络,您很快就会看到另一个示例。
As defined in Chapter 8, a milestone is an event in the project denoting the completion of a significant part of the project, including major integration achievements. Even at this early stage in the project design, you should designate the event completing Project Design (activity 3) as the SDP review milestone, M0. In this case, M0 is the completion of the front end (short for fuzzy front end) of the project, comprising requirements, architecture, and project design. This makes the SDP review an explicit part of the plan. You can have milestones on or off the critical path, and they can be public or private. Public milestones demonstrate progress for management and customers, while private milestones are internal hurdles for the team. If a milestone is outside the critical path, it is a good idea to keep it private since it could move as a result of a delay somewhere upstream from it. On the critical path, milestones can be both private and public, and they correlate directly with meeting the commitments of the project in terms of both time and cost. Another use for milestones is to force a dependency even if the call chains do not specify such a dependency. The SDP review is such a milestone—none of the construction activities should start before the SDP review. Such forced-dependency milestones also simplify the network, and you will see another example shortly.
您可以在项目规划工具中构建表 11-1中列出的活动网络,这可以让您初步了解项目持续时间。这样做可以给出持续时间该项目工期为 9.0 个月。但是,由于没有资源分配,目前还无法确定项目成本。
You can construct the network of activities listed in Table 11-1 in a project planning tool, which gives you a first look at project duration. Doing so gives a duration of 9.0 months for this project. However, without resource assignment, it is not yet possible to determine the cost of the project.
要继续设计,您需要将规划假设(尤其是计划的人员配备要求)逐项列出,如下所示:
To proceed with the design, you itemize the planning assumptions, especially the planned staffing requirements, in a list such as the following:
整个项目都需要一名项目经理。
One project manager is required throughout the project.
整个项目都需要一名产品经理。
One product manager is required throughout the project.
整个项目需要一名建筑师。
One architect is required throughout the project.
每项服务的任何编码活动都需要一名开发人员。一旦该服务完成,开发人员就可以转到另一项活动。
One developer is required per service for any coding activity. Once that service is complete, the developer can move to another activity.
每个资源都需要一名数据库架构师。这项工作与代码开发工作无关,可以并行进行。
One database architect is required for each of the Resources. This work is independent of the code development work and can be done in parallel.
从系统服务构建开始直到测试结束都需要一名测试人员。
One tester is required from the start of construction of the system services until the end of testing.
系统测试期间需要一名额外的测试人员。
One additional tester is required during system testing.
测试计划和测试工具活动需要一名测试工程师。
One test engineer is required for the test plan and test harness activities.
从构建开始到测试结束都需要一名 DevOps 专家。
One DevOps specialist is required from the start of construction until the end of testing.
事实上,此列表是完成项目所需的资源列表。还请注意列表的结构:“一个 X 对应一个 Y”。如果您不能以这种方式说明所需的人员配备,则您可能不了解自己的人员配备要求,或者您遗漏了一个关键的规划假设。
This list is, in fact, the list of resources you need to complete the project. Also note the structure of the list: “one X for Y.” If you cannot state the required staffing this way, you probably do not understand your own staffing requirements, or you are missing a key planning assumption.
您应该明确地对开发人员的测试时间和空闲时间做出另外两个规划假设。首先,在这个示例项目中,开发人员将完成高质量的工作,因此在系统测试期间不需要他们。其次,活动之间的开发人员被视为直接成本。严格来说,空闲时间应计为间接成本,因为它与项目活动无关,但项目必须为此付费。然而,许多项目经理努力指派闲置的开发人员参与某些活动以支持其他开发活动,即使这意味着每个服务临时指派多名开发人员。根据此规划假设,您仍将活动之间的开发人员视为直接成本。
You should explicitly make two additional planning assumptions about the developers regarding testing time and idle time. First, in this example project, developers will produce such high-quality work that they will not be needed during system testing. Second, developers between activities are considered a direct cost. Strictly speaking, idle time should be accounted for as an indirect cost because it is not associated with project activities, yet the project must pay for it. However, many project managers strive to assign idle developers some activities in support of other development activities, even if that means more than one developer is temporarily assigned per service. Under this planning assumption, you still account for developers between activities as direct cost.
项目中的每个活动始终属于一个阶段或一类活动。典型的阶段包括前端、设计、基础设施、服务、UI、测试和部署等。一个阶段可能包含任意数量的活动,并且一个阶段中的活动可以在时间线上重叠。不太明显的是,阶段不是连续的,它们本身可以重叠,甚至可以开始和停止。布置阶段的最简单方法是将规划假设列表构建为角色/阶段表;表 11-2提供了一个示例。
Each activity in a project always belongs to a phase, or a type of activities. Typical phases include the front end, design, infrastructure, services, UI, testing, and deployment, among others. A phase may contain any number of activities, and the activities in a phase can overlap on the timeline. What is less obvious is that the phases are not sequential and can themselves overlap or even start and stop. The easiest way of laying out phases is to structure the planning assumptions list into a role/phase table; Table 11-2 provides an example.
表 11-2角色和阶段
Table 11-2 Roles and phases
角色 Role |
前端 Front End |
基础设施 Infrastructure |
服务 Services |
测试 Testing |
|---|---|---|---|---|
建筑师 Architect |
十 X |
十 X |
十 X |
十 X |
专案经理 Project Manager |
十 X |
十 X |
十 X |
十 X |
产品经理 Product Manager |
十 X |
十 X |
十 X |
十 X |
DevOps DevOps |
|
十 X |
十 X |
十 X |
开发人员 Developers |
|
十 X |
十 X |
|
测试人员 Testers |
|
|
十 X |
十 X |
类似地,您可以添加整个阶段所需的其他角色,例如 UX(用户体验)或安全专家。但是,您不应添加仅在特定活动所需的角色,例如测试工程师。
In much the same way, you could add other roles that are required for the duration of an entire phase, such as UX (user experience) or security experts. However, you should not include roles that are required only for specific activities, such as the test engineer.
表 11-2是人员分配视图的粗略形式。在构建人员分配图时,角色和阶段之间的关系至关重要,因为您必须考虑所有资源的使用情况,无论这些资源是否分配给特定的项目活动。例如,在表 11-2中,整个项目都需要一名建筑师。反过来,在人员分配图中,您将在整个项目期间显示建筑师。通过这种方式,您可以考虑生成正确的人员分配图和成本计算所需的所有资源。
Table 11-2 is a crude form of a staffing distribution view. The relationship between roles and phases is essential when building the staffing distribution chart because you must account for the use of all the resources, regardless of whether they are assigned to specific project activities. For example, in Table 11-2, an architect is required throughout the project. In turn, in the staffing distribution chart, you would show the architect across the duration of the project. In this way, you can account for all resources necessary to produce the correct staffing distribution chart and cost calculation.
有了活动、依赖关系和规划假设的列表,您就可以继续迭代寻找正常解决方案。对于第一遍,假设您拥有无限的资源,但您只会使用尽可能多的资源来畅通无阻地沿着关键路径前进。这提供了以最低资源水平构建系统的最不受约束的方式。
With the list of activities, dependencies, and planning assumptions in hand, you proceed to iteratively find the normal solution. For the first pass, assume that you have unlimited resources at your disposal, but you will utilize only as many resources as required to progress unimpeded along the critical path. This provides the least constrained way of building the system at the lowest level of resources.
首先,还假设您的人员配置弹性是无限的。您可以根据实际情况进行非常小的(如果有的话)调整。例如,如果您可以牺牲一些浮动时间并避免对该资源的任何需求,那么雇用一个人一周是没有意义的。您假设在需要时可以使用每种特殊技能。这些宽松的假设应该会产生与分配资源之前相同的项目持续时间。事实上,以这种方式为项目配备人员后,持续时间仍为 9.0 个月,并产生如图 11-7所示的计划挣值图。该图表显示出浅 S 曲线的一般形状,但它并不像它应该的那样平滑。
Initially, also assume that you have unlimited staffing elasticity. You could make very minor (if any) adjustments for reality. For example, there is no point in hiring a person for a single week if you can trade some float and avoid any need for that resource. You assume that every special skill set is available when needed. These liberal assumptions should yield the same project duration as you had before assigning resources. Indeed, after staffing the project this way, the duration remains at 9.0 months and yields the planned earned value chart shown in Figure 11-7. This chart exhibits the general shape of a shallow S curve but it is not as smooth as it should be.
图片 11-7具有无限资源的计划挣值
Figure 11-7 Planned earned value with unlimited resources
按照第 7 章概述的流程,图 11-8显示了相应的项目人员配置图。此计划使用多达四名开发人员和两名数据库架构师,使用一名测试工程师,并且不消耗任何浮动时间。计算出的项目总成本为 58.3 人月。
Following the process outlined in Chapter 7, Figure 11-8 shows the corresponding project staffing distribution chart. This plan uses as many as four developers and two database architects, uses one test engineer, and does not consume any float. The calculated project total cost is 58.3 man-months.
图 11-8无限资源的人员分配
Figure 11-8 Staffing distribution with unlimited resources
回想一下第 9 章,找到正常解决方案是一个迭代过程(见图9-8),因为在设计工作开始时并不知道最低级别的人员配备。因此,第一组结果还不是正常解决方案。在下一次迭代中,您应该适应现实,利用浮动时间以降低人员配备波动性,解决任何明显的设计缺陷,并尽可能降低复杂性。
Recall from Chapter 9 that finding the normal solution is an iterative process (see Figure 9-8) simply because the lowest level of staffing is not known at the beginning of the design effort. Therefore, this first set of results is not yet the normal solution. In the next iteration you should accommodate reality, consume float to decrease staffing volatility, address any obvious design flaws, and reduce complexity if possible.
常规解决方案的第一次迭代存在几个关键问题。首先,它假设资源是无限的且随时可用的,包括那些具有特殊技能的资源。显然,资源不是无限的,特殊技能也很少见。其次,计划人员分配图(见图11-8)显示了一个令人担忧的迹象(在第 7 章中确定)——即项目开始时出现高坡度。由于对人员可用性和弹性的假设,您应该预料到这种行为。第三,在项目期间,项目仅在短时间内使用某些资源。就可用性和必要的入职时间而言,这会带来麻烦。虽然您可以计划通过使用分包商来缓解这种情况,但您不应该制造需要解决的问题。人员分配应该是平稳的,您应该避免高坡度和急剧下降。您应该创建其他具有一些资源限制并具有更现实的人员弹性的项目变体。通常,这也会使人员分配和计划挣值图变得平滑。
The first iteration on the normal solution suffers from several key problems. First, it assumes unlimited and readily available resources, including those with special skills. Clearly, resources are not limitless, and special skills are rare. Second, the planned staffing distribution chart (see Figure 11-8) displays a concerning sign (identified in Chapter 7)—namely, a high ramp coming into the project. You should expect this behavior due to the assumptions about staffing availability and elasticity. Third, over its duration, the project engages some resources only for short periods of time. This is asking for trouble as far as availability and necessary onboarding time. While you could plan to mitigate that by using subcontractors, you should not create problems that you need to solve. The staffing distribution should be smooth, and you should avoid high ramps and sharp drops. You should create other project variations that have some resource constraints and entertain a more realistic staffing elasticity. Often, this will also smooth both the staffing distribution and the planned earned value charts.
到目前为止,该解决方案的最后一个问题是管理器服务的集成压力。从表 11-1和图 11-6的网络图中,您可以看到管理器(活动 17 和 18)预计会与四五个其他服务集成。理想情况下,您应该一次只集成一两个服务。同时集成两个以上的服务可能会导致复杂性呈非线性增加,因为服务之间的任何问题都会相互叠加。由于集成发生在项目结束时,此时您几乎没有时间修复问题,因此问题会进一步加剧。
The last problem with the solution so far is the integration pressure on the Manager services. From Table 11-1 and the network diagram of Figure 11-6, you can see that the Managers (activities 17 and 18) are expected to integrate with four or five other services. Ideally, you should integrate only one or two services at a time. Integrating more than two services concurrently will likely result in a nonlinear increase in complexity because any issues across services will be superimposed on each other. The problem is further compounded because the integrations occur toward the end of the project, when you have little runway left to fix issues.
简化项目的一种常用方法是将基础设施服务(实用程序,例如Logging、Security和Pub/Sub,以及任何其他基础设施活动,例如构建自动化)移至项目的开始,而不管它们在网络中的自然依赖关系如何。换句话说,在之后M0,开发人员将立即开始处理这些基础设施服务。您甚至可以引入一个名为的里程碑,M1表示基础设施何时完成,使所有其他服务都依赖于,如图 11-9M1中的子网所示。
A common technique to simplify the project is to move the infrastructure services (Utilities such as Logging, Security, and Pub/Sub, and any additional infrastructure activity such as build automation) to the beginning of the project, regardless of their natural dependencies in the network. In other words, immediately after M0, the developers will work on these infrastructure services. You can even introduce a milestone called M1 denoting when the infrastructure is complete, making all other services depend on M1, as shown in the subnetwork in Figure 11-9.
图 11-9基础设施优先
Figure 11-9 Infrastructure first
首先完成基础设施可降低网络的复杂性(减少依赖项和交叉线路的数量),并减轻管理人员的集成压力。以这种方式覆盖原始依赖项通常会减少初始人员配备需求,因为在完成之前,其他服务都无法启动M1。它还可以减少人员配备波动,并且通常会在项目开始时带来更平稳的人员配备分配和逐步提升。
Completing the infrastructure first reduces the complexity in the network (decreases the number of dependencies and crossing lines) and alleviates the integration pressure at the Managers. Overriding the original dependencies in this way typically reduces the initial staffing demand because none of the other services can start until M1 is complete. It also reduces the staffing volatility and usually results in a smoother staffing distribution and a gradual ramp-up at the beginning of the project.
首先开发基础架构的另一个重要优势是它提供了对关键基础架构组件的早期访问。这使开发人员可以在构建系统时将他们的工作与基础架构集成,而不必事后改造和测试基础架构服务(例如Logging或Security)。在业务相关组件(ResourceAccess、Engines、Managers和Clients)之前提供基础架构服务几乎总是一个好主意,即使最初没有明显的需求。
Another important advantage of developing the infrastructure first is the early access it provides to key infrastructure components. This allows developers to integrate their work with the infrastructure as they construct the system, rather than having to retrofit and test infrastructure services (such as Logging or Security) after the fact. Having the infrastructure services available before the business-related components (ResourceAccess, Engines, Managers, and Clients) is almost always an excellent idea, even if the need is not evident at first.
先开发基础设施会将初始人员配置改为三名开发人员(每项服务一名),直到之后M1,项目可以吸收第四名开发人员(请注意,您仍在制定具有无限资源的人员配置计划)。重复前面的步骤,基础设施优先计划将时间表延长 3% 至 9.2 个月,并产生 2% 的额外总成本,即 59 人月。作为可忽略不计的额外成本和时间表的交换,该项目可以提前获得关键服务,并制定更简单、更现实的计划。展望未来,这个新项目将成为下一次迭代的基准。
Developing the infrastructure first changes the initial staffing to three developers (one per service) until after M1, at which point the project can absorb a fourth developer (note you are still working on an staffing plan with unlimited resources). Repeating the prior steps, the infrastructure first plan extends the schedule by 3% to 9.2 months and incurs 2% of additional total cost, to 59 man-months. In exchange for the negligible additional cost and schedule, the project gains early access to key services and a simpler, more realistic plan. Going forward, this new project becomes the baseline for the next iteration.
您请求的资源可能并不总是在您需要时可用,因此,明智的做法是计划较少的资源(至少在最初阶段)以减轻这种风险。如果在项目开始时没有三名开发人员可用,项目将如何表现?如果根本没有可用的开发人员,架构师可以开发基础设施,或者项目可以聘请分包商:基础设施服务不需要领域知识,因此它们是此类外部和随时可用的资源的良好候选者。如果一开始只有一名开发人员可用,那么该开发人员可以按顺序完成所有基础设施组件。可能最初只有一名开发人员可用,然后在第一项活动完成后,第二名开发人员可以加入。
The resources you ask for may not always be available when you need them, so it is prudent to plan for fewer resources (at least initially) to mitigate that risk. How will the project behave if three developers are unavailable at the beginning of the project? If no developers at all are available, the architect can develop the infrastructure, or the project can engage subcontractors: The infrastructure services do not require domain knowledge, so they are good candidates for such external and readily available resources. If only one developer is available at the beginning, then that single developer can do all infrastructure components serially. Perhaps only a single developer is available initially, and then a second developer can join in after the first activity is complete.
选择后者,即先有一名开发人员,然后是两名开发人员的中间方案,重新计算基础设施优先项目,看看项目在资源有限的情况下的表现如何。如图 11-9所示,一开始有三个并行活动(其中一项至关重要),现在则有一个关键活动,然后是两个并行活动(其中一项至关重要)。以这种方式序列化活动可以延长项目的持续时间。这种变化将进度延长了 8%,达到 9.9 个月,并将总成本增加 4%,达到 61.5 个人月。图 11-10显示了最终的人员分配图。请注意开发人员的逐步加入,从一名到两名,再到四名。
Choosing the latter, somewhat middle-of-the-road scenario of one and then two developers, recalculate the infrastructure-first project to see how the project behaves with limited resources. Instead of three parallel activities (one critical) at the beginning, as shown in Figure 11-9, you now have one activity that is critical, then two activities in parallel (one critical). Serializing activities in this way increases the duration of the project. This variation extends the schedule by 8% to 9.9 months and increases the total cost by 4% to 61.5 man-months. Figure 11-10 shows the resulting staffing distribution chart. Note the gradual phasing in of the developers, from one, to two, to four.
图 11-10有限资源条件下基础设施优先的人员配置
Figure 11-10 Staffing distribution of infrastructure-first with limited resources
通过限制资源来延长关键路径也会增加跨越该部分网络的非关键活动的浮动时间。与无限资源相比, (图 11-6Test Plan中的活动 4 )和(图 11-6中的活动 5 )的浮动时间增加了 30%,(图 11-6中的活动 9 )的浮动时间增加了 50%,(图 11-6中的活动 10 )的浮动时间增加了 100%。这一点值得注意,因为资源可用性的一个看似微小的变化却会大幅增加浮动时间。请注意,这把刀是一把双刃剑:有时看似无害的变化可能会导致浮动时间崩溃并使项目脱轨。Test HarnessResource AResource B
Extending the critical path by limiting the resources also increases the float of the noncritical activities that span that section of the network. Compared with unlimited resources, the float of the Test Plan (activity 4 in Figure 11-6) and Test Harness (activity 5 in Figure 11-6) is increased by 30%, the float of Resource A (activity 9 in Figure 11-6) is increased by 50%, and the float of Resource B (activity 10 in Figure 11-6) is increased by 100%. This is noteworthy because a seemingly minute change in resource availability has increased the float dramatically. Be aware that this knife can cut both ways: Sometimes a seemingly innocuous change can cause the floats to collapse and derail the project.
除了限制开发人员的初始可用性之外,假设项目没有获得先前解决方案中要求的数据库架构师。这当然是现实生活中的场景——这种合格的资源通常很难获得。在这种情况下,开发人员会尽其所能设计数据库。要了解项目如何应对这一新限制,而不是仅仅增加开发人员,将项目限制为不超过四名开发人员(允许更多开发人员与拥有数据库架构师相同)。令人惊讶的是,这并没有改变持续时间和结果,总成本为 62.7 人月,仅增加了 2%。原因是同样的四名开发人员更早开始工作,甚至不必消耗浮动时间。
On top of limiting the initial availability of the developers, suppose the project did not get the database architects called for in the previous solutions. This is certainly a real-life scenario—such qualified resources are often hard to come by. In this case, developers design the databases to the best of their abilities. To see how the project responds to this new limit, instead of just adding developers, constrain the project to no more than four developers (allowing for more developers would be identical to having the database architects). Surprisingly, this does not change the duration and results with a total cost of 62.7 man-months, a mere 2% increase. The reason is that the same four developers start working earlier and do not even have to consume float.
由于该项目可以轻松应对四名开发人员,因此下一个有限资源计划将可用开发人员限制为三名。这也不会改变项目的持续时间,因为可以用第四名开发人员换取一些浮动时间。至于成本,由于更有效地使用开发人员,成本减少了 3%,降至 61.1 人月。图 11-11显示了最终的人员配置分布图。
Since the project could easily cope with four developers, the next limited-resources plan caps the available developers at three. This also does not change the duration of the project because it is possible to trade the fourth developer for some float. As for cost, there is a 3% reduction in cost to 61.1 man-months due to the more efficient use of developers. Figure 11-11 shows the resulting staffing distribution chart.
图 11-11三个开发人员和一个测试工程师的人员分布
Figure 11-11 Staffing distribution with three developers and one test engineer
注意测试工程师和三名开发人员的使用情况。图 11-11是迄今为止最佳的人员配置,看起来非常像第 7 章中的预期模式(见图7-8)。
Note the use of the test engineer along with the three developers. Figure 11-11 is the best staffing distribution so far, looking very much like the expected pattern from Chapter 7 (see Figure 7-8).
图 11-12显示了计划挣值的浅 S 曲线。图中显示的是一条相当平滑的浅 S 曲线。如果非要说的话,这个浅 S 曲线几乎太浅了。稍后您将看到它的含义。
Figure 11-12 shows the shallow S curve of the planned earned value. The figure shows a fairly smooth shallow S curve. If anything, the shallow S is almost too shallow. You will see the meaning of that later on.
图片 11-12三位开发人员和一位测试工程师的计划挣值
Figure 11-12 Planned earned value with three developers and one test engineer
图 11-13显示了相应的网络图,使用了第 8 章中描述的绝对关键性浮动颜色编码方案。此示例项目使用 9 天作为红色活动的上限,使用 26 天作为黄色活动的上限。活动 ID 以黑色显示在箭头上方,浮动值以箭头颜色显示在线下方。测试工程师的活动(即(活动Test Plan4)和Test Harness(活动 5))具有非常高的浮动时间,为 65 天。请注意M0终止前端的里程碑和M1终止基础设施的里程碑。该图还显示了在和之间分阶段投入资源M0以M1构建基础设施(活动 6、7 和 8)。
Figure 11-13 shows the corresponding network diagram, using the absolute criticality float color-coding scheme described in Chapter 8. This example project uses 9 days as the upper limit for red activities and 26 days as the upper limit for the yellow activities. The activity IDs appear above the arrows in black, and the float values are shown below the line in the arrow’s color. The test engineer’s activities—that is, the Test Plan (activity 4) and the Test Harness (activity 5)—have a very high float of 65 days. Note the M0 milestone terminating the front end and the M1 milestone at the end of the infrastructure. The diagram also shows the phasing in of the resources between M0 and M1 to build the infrastructure (activities 6, 7, and 8).
图 11-13三位开发人员和一位测试工程师的网络图
Figure 11-13 Network diagram with three developers and one test engineer
减少资源的下一个实验是移除测试工程师,但保留三名开发人员。同样,这种设计方案不会改变项目的持续时间和成本。第三名开发人员只需在完成已分配的其他较低浮动时间活动后接管测试工程师的活动即可。问题是将Test Plan和Test Harness活动推迟到更晚的时间会消耗 77% 的浮动时间(从 65 天到 15 天)。这是非常危险的,因为如果浮动时间下降 100%,项目就会延迟。
The next experiment in resource reduction is to remove the test engineer but keep the three developers. Once again, this design solution results in no change to the duration and cost of the project. The third developer simply takes over the test engineer’s activities after completing other lower-float activities already assigned. The problem is that deferring the Test Plan and Test Harness activities to much later consumes 77% of their float (from 65 days to 15 days). This is very risky because if the float drops by 100%, the project is delayed.
第 9 章解释了向决策者展示次关键人员配置的影响的重要性。决策者往往没有意识到削减资源以降低成本是不切实际的。示例项目通过将开发人员数量限制为仅两名并取消测试工程师而变得次关键。当关键路径上的活动计划开始时,一些支持性非关键活动尚未准备就绪,因此它们阻碍了旧的关键路径。现在的限制因素不是关键路径的持续时间,而是两名开发人员的可用性。因此,旧网络(特别是旧的关键路径)不再适用。因此,您必须重新绘制网络图以反映对两个开发人员的依赖关系。
Chapter 9 explained the importance of presenting the effects of subcritical staffing to decision makers. Too often, decision makers are unaware of the impracticality of cutting back on resources to supposedly reduce costs. The example project becomes subcritical by limiting the number of developers to just two and eliminating the test engineer. By the time activities on the critical path are scheduled to start, some supporting noncritical activities are not ready yet, so they impede the old critical path. The limiting factor now is not the duration of the critical path, but rather the availability of the two developers. Consequently, the old network (and specifically the old critical path) no longer applies. You must therefore redraw the network diagram to reflect the dependency on the two developers.
回想一下第 7 章,资源依赖关系就是依赖关系,项目网络是一个依赖关系网络,而不仅仅是活动网络。因此,您将对资源的依赖关系添加到网络中。实际上,您在设计网络时具有一定的灵活性:只要满足活动之间的自然依赖关系,活动的实际顺序就可以有所不同。要创建新网络,您可以像往常一样根据浮动时间分配两种资源。每个开发人员在完成当前活动后,都会承担下一个浮动时间最低的活动。同时,您在开发人员的下一个活动和当前活动之间添加依赖关系,以反映对开发人员的依赖关系。图 11-14显示了示例项目的亚临界网络图。
Recall from Chapter 7 that resource dependencies are dependencies and that the project network is a dependency network, not just an activity network. You therefore add the dependency on the resources to the network. You actually have some flexibility in designing the network: As long as the natural dependency between the activities is satisfied, the actual order of the activities can vary. To create the new network, you assign the two resources, as always, based on float. Each developer takes on the next lowest-float activity available after finishing with the current activity. At the same time, you add a dependency between the developer’s next activity and the current one to reflect the dependency on the developer. Figure 11-14 shows the subcritical network diagram for the example project.
图11-14亚临界解决方案网络图
Figure 11-14 Subcritical solution network diagram
鉴于只有两名开发人员执行大部分工作,次关键网络图看起来就像两条长线。一串活动是长关键路径;另一串是第二名开发人员在旁边补位。这条长关键路径增加了项目的风险,因为项目现在有更多关键活动。一般来说,次关键项目总是高风险项目。
Given that only two developers are performing most of the work, the subcritical network diagram looks like two long strings. One string of activities is the long critical path; the other string is the second developer back-filling on the side. This long critical path increases the risk to the project because the project now has more critical activities. In general, subcritical projects are always high-risk projects.
在只有一名开发人员的极端情况下,项目中的所有活动都至关重要,网络图是一长串,风险为 1.0。项目的持续时间等于所有活动的总和,但由于最大风险,即使是这个持续时间也可能会被超出。
In the extreme case of having only a single developer, all activities in the project are critical, the network diagram is one long string, and the risk is 1.0. The duration of the project equates to the sum of all activities, but, due to the maximum risk, even that duration is likely to be exceeded.
与资源有限的三名开发人员和一名测试工程师的解决方案相比,由于活动的序列化,项目工期延长了 35%,达到 13.4 个月。在使用较小的开发团队时,由于工期较长和间接成本增加,项目总成本增加了 25%,达到 77.6 人月。这一结果清楚地证明了这一点:次要人员配置确实没有节省成本。
Compared with the limited-resources solution of three developers and one test engineer, the project duration is extended by 35% to 13.4 months due to the serialization of the activities. While using a smaller development team, the project total cost is increased by 25% to 77.6 man-months due to the longer duration and the mounting indirect cost. This result clearly demonstrates the point: There really is no cost saving with subcritical staffing.
图 11-15显示了亚临界计划挣值。你可以看到,假定的浅 S 曲线几乎是一条直线。
Figure 11-15 shows the subcritical planned earned value. You can see that the supposed shallow S curve is almost a straight line.
图片 11-15 次临界计划挣值
Figure 11-15 Subcritical planned earned value
在只有一名开发人员完成所有工作的极端情况下,计划挣值是一条直线。一般来说,计划挣值图表中缺乏曲率是次临界项目的明显标志。即使是图 11-12中略显贫乏的浅 S 曲线也表明该项目已接近次临界状态。
In the extreme case of only a single developer doing all the work, the planned earned value is a straight line. In general, a lack of curvature in the planned earned value chart is a telltale sign for a subcritical project. Even the somewhat anemic shallow S curve in Figure 11-12 indicates the project is close to becoming subcritical.
在寻找正常解决方案的过程中,我们尝试了多种资源和网络设计组合。在所有这些尝试中,迄今为止最好的解决方案是迭代 5(依赖于三名开发人员和一名测试工程师),原因如下:
The search for the normal solution has involved several attempts using combinations of resources and network designs. Out of all of these, the best solution so far was Iteration 5 (which relied on three developers and one test engineer) for several reasons:
该解决方案符合正常解决方案的定义,利用最低级别的资源,使项目能够沿着关键路径畅通无阻地进展。
This solution complies with the definition of the normal solution by utilizing the lowest level of resources that allows the project to progress unimpeded along the critical path.
该解决方案解决了数据库架构师等专家的访问限制,同时又不损害关键资源——测试工程师。
This solution works around limitations of access to experts such as database architects while not compromising on a key resource, the test engineer.
该解决方案并不期望所有开发人员同时开始工作。
This solution does not expect all the developers to start working at once.
人员分配图和计划挣值图均表现出可接受的行为。
Both the staffing distribution chart and the planned earned value chart exhibit acceptable behavior.
正如预期的那样,该解决方案的前端占项目持续时间的 25%,并且该项目的效率为 23%,这是可以接受的。回想一下第 7 章,大多数项目的效率数字不应超过 25%。
The front end of this solution encompasses, as expected, 25% of the duration of the project, and the project has an acceptable efficiency of 23%. Recall from Chapter 7 that the efficiency number should not exceed 25% for most projects.
本章的其余部分使用迭代 5 作为正常解决方案,并作为其他迭代的基线。表 11-3总结了正常解决方案的各种项目指标。
The rest of the chapter uses Iteration 5 as the normal solution and as the baseline for the other iterations. Table 11-3 summarizes the various project metrics of the normal solution.
表11-3常规解决方案项目指标
Table 11-3 Project metrics of the normal solution
项目指标 Project Metric |
价值 Value |
|---|---|
总成本(人月) Total cost (man-months) |
61.1 61.1 |
直接成本(人月) Direct cost (man-months) |
21.8 21.8 |
持续时间(月) Duration (months) |
9.9 9.9 |
平均人员配备 Average staffing |
6.1 6.1 |
人员配置高峰期 Peak staffing |
9 9 |
普通开发人员 Average developers |
2.3 2.3 |
效率 Efficiency |
23% 23% |
前端 Front End |
25% 25% |
有了正常的解决方案,您可以尝试压缩项目并查看某些压缩技术的效果如何。没有一种正确的方法来压缩项目。您必须对可用性、复杂性和成本做出假设。第 9 章讨论了各种压缩技术。一般来说,最好的策略是从压缩项目的较简单的方法开始。出于演示目的,本章展示了如何使用几种技术来压缩项目。您的具体情况会有所不同。您可以选择仅应用这里讨论的几种技术和想法,仔细权衡每种压缩解决方案的含义。
With the normal solution in place, you can try to compress the project and see how well certain compression techniques work. There is no single correct way of compressing a project. You will have to make assumptions about availability, complexity, and cost. Chapter 9 discussed a variety of compression techniques. In general, the best strategy is to start with the easier ways of compressing the project. For demonstration purposes, this chapter shows how to compress the project using several techniques. Your specific case will be different. You may choose to apply only a few of the techniques and the ideas discussed here, weighing carefully the implications of each compressed solution.
压缩任何项目的最简单方法是使用更好的资源。这不需要对项目网络或活动进行任何更改。虽然这是最简单的压缩形式,但由于此类资源的可用性,它可能不是最容易的(更多内容请参见第 14 章)。这里的目的是衡量项目将如何响应使用更好的资源进行压缩,或者是否值得追求,如果值得,如何去做。
The simplest way of compressing any project is to use better resources. This requires no changes to the project network or the activities. Although the simplest form of compression, it may not be the easiest due to the availability of such resources (more on that in Chapter 14). The purpose here is to gauge how the project will respond to compressing with better resources, or even if it is worth pursuing, and, if it is, how to do so.
假设您有一位顶级开发人员,他的编码速度比您现有的开发人员快 30%。这样的顶级开发人员的成本很可能远远超过普通开发人员成本的 30%。在这个项目中,您可以假设顶级开发人员的成本比普通开发人员高出 80%。
Suppose you have access to a top developer who can perform coding activities 30% faster than the developers you already have. Such a top developer is likely to cost much more than 30% of the cost of a regular developer. In this project you can assume the top developer costs 80% more than a regular developer.
理想情况下,您只会在关键路径上分配这样的资源,但这并不总是可行的(回想一下第 7 章中关于任务连续性的讨论)。正常的基线解决方案会将两名开发人员分配到关键路径,而您的目标是用顶级资源替换其中一名。要确定哪一个,您应该考虑活动数量和每个人在关键路径上花费的天数。
Ideally you would assign such a resource only on the critical path, but that is not always possible (recall the discussion of task continuity from Chapter 7). The normal baseline solution assigns two developers to the critical path, and your goal is to replace one of them with the top resource. To identify which one, you should consider both the number of activities and the number of days spent on the critical path per person.
表 11-4列出了常规解决方案中接触关键路径的两名开发人员、他们各自涉及的关键活动和非关键活动的数量,以及在关键路径上和关键路径外的总持续时间。显然,最好用顶级开发人员替换开发人员 2。
Table 11-4 lists the two developers in the normal solution who touch the critical path, the number of critical activities versus noncritical activities each has, and the total duration on the critical path and off the critical path. Clearly, it is best to replace Developer 2 with the top developer.
表 11-4开发人员、关键活动和持续时间
Table 11-4 Developers, critical activities, and duration
资源 Resource |
非关键活动 Noncritical Activities |
非关键持续时间(天) Noncritical Duration (days) |
关键活动 Critical Activities |
关键持续时间(天) Critical Duration (days) |
|---|---|---|---|---|
开发人员 1 Developer 1 |
4 4 |
85 85 |
2 2 |
三十五 35 |
开发人员 2 Developer 2 |
1 1 |
5 5 |
4 4 |
95 95 |
接下来,您需要重新查看表 11-1(每个活动的持续时间估计),确定开发人员 2 负责的活动,并使用 5 天分辨率将其持续时间向下调整 30%(使用更好资源的预期生产力增益)。使用新的活动持续时间,重复项目持续时间和成本分析,同时考虑开发人员 2 的额外 80% 加价。
Next, you need to revisit Table 11-1 (the duration estimations for each activity), identify the activities for which Developer 2 is responsible, and adjust their duration downward by 30% (the expected productivity gain with the better resource) using 5-day resolution. With the new activity durations, repeat the project duration and cost analysis while accounting for the additional 80% markup for Developer 2.
图 11-16显示了使用顶级开发人员压缩之前和之后网络图上的关键路径。
Figure 11-16 shows the critical path on the network diagram before and after compressing with the top developer.
图 11-16具有一名顶级开发人员的新关键路径
Figure 11-16 New critical path with one top developer
新项目工期为 9.5 个月,仅比原正常解决方案的工期短 4%。差异如此之小是因为出现了一条新的关键路径,而这条新路径阻碍了项目的进展。工期如此微小的缩短是一个相当常见的结果。即使该单一资源比其他团队成员的生产力高得多,即使分配给该顶级资源的所有活动都完成得更快,分配给普通团队成员的活动的持续时间也不会受到顶级资源的影响,这些活动只会抑制压缩。
The new project duration is 9.5 months, only 4% shorter than the duration of the original normal solution. The difference is so small because a new critical path has emerged, and that new path holds the project back. Such miniscule reduction in duration is a fairly common result. Even if that single resource is vastly more productive than the other team members, and even if all activities assigned to that top resource are done much faster, the durations of the activities assigned to regular team members are unaffected by the top resource, and those activities simply stifle the compression.
在成本方面,尽管顶级资源的成本增加了 80%,但压缩后的项目成本保持不变。这也是由于间接成本而预期的。大多数软件项目的间接成本都很高。即使缩短一点工期,也往往能收回压缩成本,至少在最初的压缩尝试中是如此,因为总成本曲线的最小值位于正常解决方案的左侧(见图9-10)。
In terms of cost, the compressed project cost is unchanged, despite having a top resource that costs 80% more. This, too, is expected because of the indirect cost. Most software projects have a high indirect cost. Reducing the duration even by a little tends to pay for the cost of compression, at least with the initial compression attempts, because the minimum of the total cost curve is to the left of the normal solution (see Figure 9-10).
您可以尝试使用多个顶级资源进行压缩。在此示例中,只需要求第二个顶级开发人员来替换开发人员 1 是有意义的,因为第三个顶级开发人员只能在关键路径之外分配。有了第二个顶级资源,压缩的效果更加明显:时间表减少了 11%,降至 8.5 个月,总成本减少了 3%,降至 59.3 人月。
You could try compressing with multiple top resources. In this example, it makes sense to ask for just a second top developer to replace Developer 1 because a third top developer could be assigned only outside the critical path. With a second top resource, the compression has a more noticeable effect: The schedule is reduced by an additional 11% to 8.5 months, and the total cost is reduced by 3% to 59.3 man-months.
通常,加速项目的唯一有效方法是引入并行工作。软件项目中有多种并行工作方式,其中一些更具挑战性。并行工作增加了项目的复杂性,因此您也应该首先考虑最简单、最容易的技术。
Often, the only meaningful way of accelerating projects is to introduce parallel work. There are multiple ways of working in parallel in a software project, some more challenging than others. Parallel work increases the project complexity, so here too you should consider the simplest and easiest techniques first.
在大多数设计良好的系统中,并行工作的最佳候选对象是基础架构和客户端设计,因为两者都独立于业务逻辑。之前,您看到了这种独立性在迭代 2 中发挥作用,它推动基础架构在 SDP 审查后立即启动。为了实现与客户端的并行工作,您可以将客户端拆分为单独的设计和开发活动。此类客户端相关设计活动通常包括 UX 设计、UI 设计和 API 或 SDK 设计(用于外部系统交互)。拆分客户端还支持更好地将客户端设计与后端系统分离,因为客户端应该为服务的消费者提供最佳体验,而不仅仅是反映底层系统。现在,您可以将基础设施开发和客户端设计活动移到与前端并行的位置。
The best candidates for parallel work in most well-designed systems are the infrastructure and the Client designs because both are independent of the business logic. Earlier, you saw this independence play out in Iteration 2, which pushed the infrastructure to start immediately after the SDP review. To enable parallel work with the Clients, you split the Clients into separate design and development activities. Such Client-related design activities typically include the UX design, the UI design, and the API or SDK design (for external system interactions). Splitting the Clients also supports better separation of Client designs from the back-end system because the Clients should provide the best experience for the consumers of the services, not merely reflect the underlying system. You can now move the infrastructure development and the Client design activities to be parallel to the front end.
然而,这一举措有两个缺点。较小的缺点是初始资金消耗率较高,这仅仅是因为一开始你需要开发人员和核心团队。较大的缺点是,在组织承诺项目之前开始工作往往会让组织决定继续进行,即使明智的做法是取消该项目。忽视沉没成本或对闪亮的 UI 模型抱有锚定偏见2是人的本性。
This move has two downsides, however. The lesser downside is the higher initial burn rate, which increases simply because you need developers as well as the core team at the beginning. The larger downside is that starting the work before the organization is committed to the project tends to make the organization decide to proceed, even if the smart thing to do is to cancel the project. It is human nature to disregard the sunk cost or to have an anchoring bias2 attached to shining UI mockups.
2. https://en.wikipedia.org/wiki/Anchoring
2. https://en.wikipedia.org/wiki/Anchoring
我建议,只有当项目保证继续进行,并且 SDP 审查的目的仅仅是选择要实施的选项(并签署项目)时,才将基础设施和客户端设计移至前端。您可以通过仅将基础设施开发与前端并行,在 SDP 审查后继续进行客户端设计,来降低影响 SDP 决策的风险。最后,确保那些将 UI 工件等同于进展的人不会将客户端设计活动误解为重大进展。您应该将前端的工作与项目跟踪(参见附录 A)结合起来,以确保决策者正确解读项目的状态。
I recommend moving the infrastructure and the Client designs to the front end only if the project is guaranteed to proceed and the purpose of the SDP review is solely to select which option to pursue (and to sign off on the project). You could mitigate the risk of biasing the SDP decision by moving only the infrastructure development to be in parallel to the front end, proceeding with the Client designs after the SDP review. Finally, make sure that the Client design activities are not misconstrued as significant progress by those who equate UI artifacts with progress. You should combine the work in the front end with project tracking (see Appendix A) to ensure decision makers correctly interpret the status of the project.
在项目的其他部分,确定并行工作的其他机会更具挑战性。您必须发挥创造力,找到消除编码活动之间依赖关系的方法。这几乎总是需要投资于支持并行工作的额外活动,例如仿真器、模拟器和集成活动。您还必须拆分活动并从中提取新活动,以进行合同、接口、消息的详细设计或依赖服务的设计。这些明确的设计活动将与其他活动并行进行。
Identifying additional opportunities for parallel work is more challenging elsewhere in the project. You have to be creative and find ways of eliminating dependencies between coding activities. This almost always requires investing in additional activities that enable parallel work such as emulators, simulators, and integration activities. You also have to split activities and extract out of them new activities for the detailed design of contracts, interfaces, messages or the design of dependent services. These explicit design activities will take place in parallel to other activities.
这种并行工作没有固定的公式。你可以对一些关键活动或大多数活动这样做。你可以提前或在进行中执行其他活动。很快你就会意识到,消除编码活动之间的所有依赖关系实际上是不可能的,因为当所有路径都接近关键时,压缩的收益会递减。你将爬上项目的直接成本曲线,该曲线在最短工期点附近(见图9-3)的特点是坡度陡峭,需要更多的成本,而进度缩短却越来越少。
There is no set formula for this kind of parallel work. You could do it for a few key activities or for most activities. You could perform the additional activities up-front or on-the-go. Very quickly you will realize that eliminating all dependencies between coding activities is practically impossible because there are diminishing returns on compression when all paths are near-critical. You will be climbing the direct cost curve of the project, which, near the minimum duration point (see Figure 9-3), is characterized by a steep slope, requiring even more cost for less and less reduction in schedule.
回到示例,项目设计压缩的下一个迭代将基础架构与前端并行移动。它还将客户端活动拆分为一些前期设计工作(例如,需求、测试计划、UI 设计)和实际客户端开发,并将客户端设计移至前端。在此示例项目中,您可以假设客户端设计活动独立于基础架构,并且每个客户端都是唯一的。
Returning to the example, the next iteration of project design compression moves the infrastructure in parallel to the front end. It also splits the Client activities into some up-front design work (e.g., requirements, test plan, UI design) and actual Client development, and moves the Client designs to the front end. In this example project you can assume that the Client design activities are independent of the infrastructure and are unique per Client.
表 11-5列出了此压缩迭代的修订活动集、它们的持续时间及其依赖关系。
Table 11-5 lists the revised set of activities, their duration, and their dependencies for this compression iteration.
表 11-5以基础设施和客户设计为先的活动
Table 11-5 Activities with infrastructure and Client designs first
ID ID |
活动 Activity |
持续时间(天) Duration (days) |
取决于 Depends On |
|---|---|---|---|
1 1 |
要求 Requirements |
15 15 |
|
2 2 |
建筑学 Architecture |
20 20 |
1 1 |
3 3 |
项目设计 Project Design |
20 20 |
2 2 |
4 4 |
测试计划 Test Plan |
三十 30 |
22 22 |
5 5 |
测试工具 Test Harness |
三十五 35 |
4 4 |
6 6 |
日志记录 Logging |
10 10 |
|
7 7 |
安全 Security |
15 15 |
6 6 |
8 8 |
发布/订阅 Pub/Sub |
5 5 |
6 6 |
9 9 |
资源 A Resource A |
20 20 |
22 22 |
10 10 |
资源 B Resource B |
15 15 |
22 22 |
11 11 |
资源访问A ResourceAccess A |
10 10 |
9,23 9,23 |
12 12 |
资源访问B ResourceAccess B |
5 5 |
10,23 10,23 |
十三 13 |
资源访问 C ResourceAccess C |
10 10 |
22,23 22,23 |
14 14 |
发动机A EngineA |
15 15 |
12,13 12,13 |
15 15 |
引擎B EngineB |
20 20 |
12,13 12,13 |
16 16 |
引擎C EngineC |
10 10 |
22,23 22,23 |
17 17 |
经理A ManagerA |
15 15 |
14,15,11 14,15,11 |
18 18 |
经理B ManagerB |
20 20 |
15,16 15,16 |
19 19 |
客户端应用程序1 Client App1 |
15 15 |
17,18,24 17,18,24 |
20 20 |
客户端应用2 Client App2 |
20 20 |
17,25 17,25 |
21 21 |
系统测试 System Testing |
三十 30 |
5,19,20 5,19,20 |
22 22 |
M0 M0 |
0 0 |
3 3 |
23 23 |
M1 M1 |
0 0 |
7,8 7,8 |
24 24 |
客户端App1设计 Client App1 Design |
10 10 |
|
二十五 25 |
客户端App2设计 Client App2 Design |
15 15 |
|
请注意Logging(活动 6),因此其余基础设施活动以及新的客户设计活动(活动 24 和 25)可以在项目开始时开始。还请注意,实际的客户开发活动(活动 19 和 20)现在更短,并且取决于相应客户设计活动的完成情况。
Note that Logging (activity 6), and therefore the rest of the infrastructure activities, along with the new Client design activities (activities 24 and 25), can start at the beginning of the project. Note also that the actual Client development activities (activities 19 and 20) are shorter now and depend on the completion of the respective Client design activities.
通过将基础设施和客户设计活动移至前端来压缩此项目时,会出现几个潜在问题。第一个挑战是成本。即使由相同的资源连续完成,前端的持续时间现在也超过了基础设施和客户设计活动的持续时间。因此,与前端同时开始这项工作是浪费的,因为开发人员在最后会闲置。将基础设施和客户设计的启动推迟到它们变得至关重要时更经济。这将增加项目的风险,但会降低成本,同时仍能压缩项目。
Several potential issues arise when compressing this project by moving the infrastructure and the Client design activities to the front end. The first challenge is cost. The duration of the front end now exceeds the duration of the infrastructure and the Client design activities, even when done serially by the same resources. Therefore, starting that work simultaneously with the front end is wasteful because the developers will be idle toward the end. It is more economical to defer the start of the infrastructure and the Client designs until they become critical. This will increase the risk of the project, but reduce the cost while still compressing the project.
在此迭代中,您可以进一步降低成本,方法是先使用相同的两名开发人员开发基础设施;基础设施完成后,他们再进行客户端设计活动。由于资源依赖关系是依赖关系,因此您可以使客户端设计活动依赖于基础设施的完成(M1)。为了最大限度地压缩,前端使用的两名开发人员在 SDP 审查()后立即继续进行其他项目活动(资源)。在此特定情况下,为了正确计算浮动时间,您还使 SDP 审查依赖于客户端M0设计活动的完成。这会从客户端本身消除对客户端设计活动的依赖,并允许客户端从 SDP 审查中继承依赖关系。同样,在这种情况下,您可以负担得起覆盖网络的依赖关系,因为前端比基础设施和客户端设计活动的总和还要长。表 11-6显示了网络的修订依赖关系(更改以红色标记)。
In this iteration, you can reduce the cost further by using the same two developers to develop the infrastructure first; after completing the infrastructure, they follow with the Client design activities. Since resource dependencies are dependencies, you make the Client design activities depend on the completion of the infrastructure (M1). To maximize the compression, the two developers used in the front end proceed to other project activities (the Resources) immediately after the SDP review (M0). In this specific case, to calculate the floats correctly, you also make the SDP review dependent on the completion of the Client design activities. This removes the dependency on the Client design activities from the Clients themselves and allows the Clients to inherit the dependency from the SDP review instead. Again, you can afford to override the dependencies of the network in this case only because the front end is longer than the infrastructure and the Client design activities combined. Table 11-6 shows the revised dependencies of the network (changes noted in red).
表 11-6首先修订了基础设施和客户端设计的依赖关系
Table 11-6 Revised dependencies with infrastructure and Client designs first
ID ID |
活动 Activity |
持续时间(天) Duration (days) |
取决于 Depends On |
|---|---|---|---|
1 1 |
要求 Requirements |
15 15 |
|
… … |
… … |
… … |
… … |
19 19 |
客户端应用程序1 Client App1 |
15 15 |
17,18 ,24 17,18,24 |
20 20 |
客户端应用2 Client App2 |
20 20 |
17 ,25 17,25 |
21 21 |
系统测试 System Testing |
三十 30 |
5,19,20 5,19,20 |
22 22 |
M0 M0 |
0 0 |
3、24、25 3,24,25 |
23 23 |
M1 M1 |
0 0 |
7,8 7,8 |
24 24 |
客户端App1设计 Client App1 Design |
10 10 |
23 23 |
二十五 25 |
客户端App2设计 Client App2 Design |
15 15 |
23 23 |
拆分活动的另一个挑战是客户整体的复杂性增加。您可以通过将客户设计活动和开发分配给上一次压缩迭代中的相同两名顶级开发人员来补偿这种复杂性。这会加剧顶级资源压缩的影响。但是,由于客户和项目现在更加复杂和苛刻,您应该进一步补偿这一点,假设构建客户所需的时间没有减少 30% (但开发人员的成本仍然增加 80%)。这些补偿已经反映在表 11-5和表 11-6中活动 19 和 20 的持续时间估算中。
The other challenge with splitting activities is the increased complexity of the Clients as a whole. You could compensate for that complexity by assigning the Client design activities and development to the same two top developers from the previous compression iteration. This compounds the effect of compression with top resources. However, since the Clients and the project are now more complex and demanding, you should further compensate for that by assuming there is no 30% reduction in the time it takes to build the Clients (but the developers still cost 80% more). These compensations are already reflected in the duration estimation of activities 19 and 20 in Table 11-5 and Table 11-6.
此次压缩迭代的结果是成本比上一个解决方案增加了 6%,达到 62.6 人月,工期缩短了 8%,达到 7.8 个月。图 11-17显示了结果网络图。
The result of this compression iteration is a cost increase of 6% from the previous solution to 62.6 man-months and a schedule reduction of 8% to 7.8 months. Figure 11-17 shows the resulting network diagram.
图 11-17基础设施和客户端设计网络图
Figure 11-17 Network diagram for infrastructure and Client designs first
检查图 11-17可以发现,在关键路径旁边还出现了一条近关键路径(活动 10、12、15、18 和 19)。这意味着任何进一步的压缩都需要将这两条路径压缩到相似的程度。只压缩其中一条路径效果不大,因为另一条路径决定了项目的持续时间。在这种情况下,最好寻找一个皇冠—即位于两条路径之上的大型活动。压缩皇冠会压缩两条路径。在此示例项目中,最佳候选者是客户端应用程序(活动 19 和 20)和管理者服务(活动 17 和 18)的开发。客户端和管理器是相对较大的活动,它们是这两条路径的皇冠。您可以尝试压缩客户端、管理器或两者。
Examining Figure 11-17 reveals that a near-critical path (activities 10, 12, 15, 18, and 19) has developed alongside the critical path. This means any further compression requires compressing both of these paths to a similar degree. Compressing just one of them will have little effect because the other path then dictates the duration of the project. In this kind of situation, it is best to look for a crown—that is, a large activity that sits on top of both paths. Compressing the crown compresses both paths. In this example project, the best candidates are the development of the Client apps (activities 19 and 20) and the Manager services (activities 17 and 18). The Clients and Managers are relatively large activities, and they crown both paths. You could try to compress the Clients, the Managers, or both.
当您为所依赖的管理器服务开发模拟器(参见第 9 章)并将客户端的开发移至网络上游某处,与其他活动并行时,压缩客户端就成为可能。由于没有模拟器可以完美替代真实服务,因此,一旦管理器完成,您还需要在客户端和管理器之间添加明确的集成活动。这实际上将每个客户端开发分为两个活动:第一个是针对模拟器的开发活动,第二个是针对管理器的集成活动。因此,客户端开发可能不会被压缩,但整个项目持续时间会缩短。
Compressing the Clients becomes possible when you develop simulators (see Chapter 9) for the Manager services on which they depend and move the development of the Clients somewhere upstream in the network, in parallel to other activities. Since no simulator is ever a perfect replacement for the real service, you also need to add explicit integration activities between the Clients and the Managers, once the Managers are complete. This in effect splits each Client development into two activities: The first is a development activity against the simulators, and the second is an integration activity against the Managers. As such, the Clients development may not be compressed, but the overall project duration is shortened.
您可以通过开发Manager所依赖的Engines和ResourceAccess服务的模拟器来模仿这种方法,这样可以在项目早期开发Manager。但是,在设计良好的系统和项目中,这通常会困难得多。虽然模拟底层服务需要更多的模拟器并使项目网络变得非常复杂,但真正的问题是时间问题。这些模拟器的开发必须与它们应该模拟的服务的开发大致同时进行,因此您可以从这种方法中实现的实际压缩是有限的。您应该只在万不得已的情况下才考虑使用内部服务的模拟器。
You could mimic this approach by developing simulators for the Engines and ResourceAccess services on which the Managers depend, which enables development of the Managers earlier in the project. However, in a well-designed system and project, this would usually be far more difficult. Although simulating the underlying services would require many more simulators and make the project network very complex, the real issue is timing. The development of these simulators would have to take place more or less concurrently with the development of the very services they are supposed to simulate, so the actual compression you can realize from this approach is limited. You should consider simulators for the inner services only as a last resort.
在此示例项目中,最佳方法是仅模拟管理器。您可以通过使用模拟器压缩之前的压缩迭代(前端的基础设施和客户端设计)来复合它。压缩此迭代时适用一些新的规划假设:
In this example project, the best approach is to simulate the Managers only. You can compound the previous compression iteration (infrastructure and Client designs at the front end) by compressing it with simulators. A few new planning assumptions apply when compressing this iteration:
依赖项。模拟器可以在前端之后启动,并且它们还需要基础设施。这是通过对M0(活动 22)的依赖而继承的。
Dependencies. The simulators could start after the front end, and they also require the infrastructure. This is inherited with a dependency on M0 (activity 22).
额外的开发人员。当使用前一个压缩迭代作为起点时,需要两名额外的开发人员来开发模拟器和客户端实现。
Additional developers. When using the previous compression iteration as the starting point, two additional developers are required for the development of simulators and the Client implementations.
起点。通过将模拟器和客户端的工作推迟到它们变得至关重要时,可以降低两名额外开发人员的成本。但是,包含模拟器的网络往往相当复杂。您应该尽快从模拟器开始,以弥补这种复杂性,并使项目受益于更高的浮动时间,而不是降低成本。
Starting point. It is possible to reduce the cost of the two additional developers by deferring the work on the simulators and the Clients until they become critical. However, networks containing simulators tend to be fairly complex. You should compensate for that complexity by starting with the simulators as soon as possible and have the project benefit from higher float as opposed to lower cost.
表 11-7列出了活动和依赖关系的更改,同时使用前一次迭代作为基线解决方案并结合其规划假设(更改以红色标注)。
Table 11-7 lists the activities and the changes to dependencies, while using the previous iteration as the baseline solution and incorporating its planning assumptions (changes noted in red).
表 11-7使用管理器模拟器的活动
Table 11-7 Activities with Manager simulators
ID ID |
活动 Activity |
持续时间(天) Duration (days) |
取决于 Depends On |
|---|---|---|---|
1 1 |
要求 Requirements |
15 15 |
|
… … |
… … |
… … |
… … |
17 17 |
经理A ManagerA |
15 15 |
… … |
18 18 |
经理B ManagerB |
20 20 |
… … |
19 19 |
客户端 App1集成 Client App1 Integration |
15 15 |
17,18 ,28 17,18,28 |
20 20 |
客户端 App2集成 Client App2 Integration |
20 20 |
17 ,29 17,29 |
… … |
… … |
… … |
… … |
二十六 26 |
经理模拟器 ManagerA Simulator |
15 15 |
22 22 |
二十七 27 |
ManagerB模拟器 ManagerB Simulator |
20 20 |
22 22 |
二十八 28 |
客户端应用程序1 Client App1 |
15 15 |
26,27 26,27 |
二十九 29 |
客户端应用2 Client App2 |
20 20 |
二十六 26 |
图 11-18显示了最终的人员配置分布图。可以清楚地看到前端之后开发人员的急剧增加以及资源的几乎恒定的利用率。此解决方案中的平均人员配置为 8.9 人,峰值人员配置为 11 人。与之前的压缩迭代相比,模拟器解决方案的持续时间减少了 9%,为 7.1 个月,但总成本仅增加了 1%,为 63.5 人月。这一小笔成本增加是由于间接成本的减少以及团队并行工作时效率和预期吞吐量的提高。
Figure 11-18 shows the resulting staffing distribution chart. You can clearly see the sharp jump in the developers after the front end and the near-constant utilization of the resources. The average staffing in this solution is 8.9 people, with peak staffing of 11 people. Compared with the previous compression iteration, the simulators solution results in a 9% reduction in duration to 7.1 months, but increases the total cost by only 1% to 63.5 man-months. This small cost increase is due to the reduction in the indirect cost and the increased efficiency and expected throughput of the team when working in parallel.
图11-18模拟器解决方案人员分配图
Figure 11-18 The simulators solution staffing distribution chart
图 11-19显示了模拟器解决方案网络图。您可以看到模拟器(活动 26 和 27)和客户端开发(活动 28 和 29)的高浮动时间。还请注意,几乎所有其他网络路径都是关键或接近关键的,并且在项目结束时存在很高的集成压力。该解决方案对不可预见的因素很脆弱,网络复杂性大大增加了执行风险。
Figure 11-19 shows the simulators solution network diagram. You can see the high float for the simulators (activities 26 and 27) and Client development (activities 28 and 29). Also observe that virtually all other network paths are critical or near critical and there is high integration pressure toward the end of the project. This solution is fragile to the unforeseen, and the network complexity drastically increases the execution risk.
图11-19模拟器解决方案网络图
Figure 11-19 The simulators solution network diagram
与常规解决方案相比,模拟器解决方案将进度缩短了 28%,而成本仅增加了 4%。由于间接成本很高,压缩最终几乎是物有所值的。相比之下,直接成本增加了 59%,从百分比的角度来看,是工期缩短的两倍。如第 9 章所述,软件项目的最大预期压缩率最多为 30%,因此模拟器解决方案的压缩率是该项目可能达到的最高水平。
Compared with the normal solution, the simulators solution reduces the schedule by 28% while increasing the cost by only 4%. Due to the high indirect cost, the compression ends up practically paying for itself. The direct cost, by comparison, increases by 59%, twice the reduction in the duration from a percentage standpoint. As noted in Chapter 9, the maximum expected compression of a software project is at most 30%, making the simulators solution as compressed as this project is ever likely to get.
虽然理论上可以进一步压缩(通过压缩管理人员),但实际上项目设计团队只能做到这点。进一步压缩成功的可能性很低,核心团队将浪费时间设计不可能的项目。
While further compression is theoretically possible (by compressing the Managers), in practice this is as far as the project design team should go. There is a low probability of success from compressing any further, and the core team will be wasting time designing improbable projects.
重要的是要认识到与正常解决方案相比,压缩如何影响团队的预期吞吐量。如第 7 章所述,浅 S 曲线的倾斜度表示团队的吞吐量。图 11-20以相同的比例绘制了正常解决方案和每个压缩解决方案的计划挣值浅 S 曲线。
It is important to recognize how compression affects the expected throughput of the team compared with the normal solution. As explained in Chapter 7, the pitch of the shallow S curve represents the throughput of the team. Figure 11-20 plots the shallow S curves of the planned earned value for the normal solution and each of the compressed solutions on the same scale.
图片 11-20项目解决方案的计划挣值
Figure 11-20 Planned earned value of the project solutions
正如预期的那样,压缩解决方案具有更陡峭的浅 S,因为它们完成得更快。您可以通过将每条曲线替换为其各自的线性回归趋势线并检查直线方程来量化所需吞吐量的差异(见图11-21)。
As expected, the compressed solutions have a steeper shallow S since they complete sooner. You can quantify the difference in the required throughput by replacing each curve with its respective linear regression trend line and examining the equation of the line (see Figure 11-21).
图片 11-21项目解决方案的挣值趋势线
Figure 11-21 Earned value trend lines for the project solutions
趋势线是直线,因此x项的系数是线的斜度,因此是团队的预期生产率。在正常解决方案中,团队预计以 39 个生产率单位运行,而模拟器解决方案则要求 59 个生产率单位(0.0039 对 0.0059,缩放为整数)。这些生产率单位的确切性质并不重要。重要的是这两个解决方案之间的差异:模拟器解决方案预计团队的生产率将增加 51%(59 - 39 = 20,即 39 的 51%)。任何团队,即使通过扩大规模,也不可能将其生产率提高如此大的倍数。
The trend lines are straight lines, so the coefficient of the x term is the pitch of the line and, therefore, the expected throughput of the team. In the case of the normal solution, the team is expected to operate at 39 units of productivity, while the simulators solution calls for 59 units of productivity (0.0039 versus 0.0059, scaled to integers). The exact nature of these units of productivity is immaterial. What is important is the difference between the two solutions: The simulators solution expects a 51% increase in the throughput of the team (59 – 39 = 20, which is 51% of 39). It is unlikely that any team, even by increasing its size, could increase its throughput by such a large factor.
虽然这不是一个硬性规定,但比较一个解决方案与另一个解决方案的平均人员配置与峰值人员配置比率可以让您了解吞吐量差异是否现实。对于模拟器解决方案,该比率为 81%,而正常解决方案为 68%;换句话说,模拟器解决方案需要更密集地利用资源。由于模拟器解决方案还需要更大的平均团队规模(8.9 对比正常解决方案的 6.1),并且由于更大的团队往往效率较低,因此实现 51% 的吞吐量增长的前景值得怀疑,尤其是在处理更复杂的项目时。这进一步巩固了这样的想法:模拟器解决方案对大多数团队来说门槛太高了。
Although not a hard-and-fast rule, comparing the ratio of average-to-peak staffing from one solution to another can give you some sense whether the throughput difference is realistic. For the simulators solution, this ratio is 81%, compared to 68% for the normal solution; in other words, the simulators solution expects more intense utilization of the resources. Since the simulators solution also requires a larger average team size (8.9 versus 6.1 of the normal solution) and since larger teams tend to be less efficient, the prospect of achieving the 51% increase in throughput is questionable, especially when working on a more complex project. This further cements the idea that the simulators solution is a bar set too high for most teams.
每个项目设计解决方案的效率都是一个相当容易计算的数字,而且非常有说服力。回想一下第 7 章,效率数字既表明了团队的预期效率,也表明了设计假设在约束、人员配置弹性和项目关键性方面的现实程度。图 11-22显示了示例项目的项目解决方案效率图表。
The efficiency of each project design solution is a fairly easy number to calculate—and a very telling one. Recall from Chapter 7 that the efficiency number indicates both the expected efficiency of the team and how realistic the design assumptions are regarding constraints, staffing elasticity, and criticality of the project. Figure 11-22 shows the project solutions efficiency chart for the example project.
图 11-22项目效率图
Figure 11-22 The project efficiency chart
在图 11-22中可以看到,峰值效率是在正常解决方案下实现的,这是由于资源利用率最低,没有任何压缩成本。随着项目的压缩,效率会下降。虽然模拟器解决方案与正常解决方案相当,但我认为这是不现实的,因为项目要复杂得多复杂,可行性值得怀疑(吞吐量分析表明)。亚临界解决方案效率很差,因为直接成本与间接成本的比率很差。简而言之,正常解决方案是最高效的。
Observe in Figure 11-22 that peak efficiency is at the normal solution, resulting from the lowest level of resources utilization without any compression cost. As you compress the project, efficiency declines. While the simulators solution is on par with the normal solution, I consider it unrealistic since the project is much more complex and its feasibility is in question (as indicated by the throughput analysis). The subcritical solution is awful when it comes to efficiency due to the poor ratio of direct cost to the indirect cost. In short, the normal solution is the most efficient.
设计完各个解决方案并生成其人员分配图后,就可以计算出每个解决方案的成本要素了,如表11-8所示。
Having designed each solution and produced its staffing distribution chart, you can calculate the cost elements for each solution, as shown in Table 11-8.
表 11-8各种方案的持续时间、总成本和成本要素
Table 11-8 Duration, total cost, and cost elements for the various options
设计选项 Design Option |
持续时间(月) Duration (months) |
总成本(人月) Total Cost (man-months) |
直接成本(人月) Direct Cost (man-months) |
间接成本(人月) Indirect Cost (man-months) |
|---|---|---|---|---|
模拟器 Simulators |
7.1 7.1 |
63.5 63.5 |
34.8 34.8 |
28.7 28.7 |
基础设施+客户端前端 Infra+Clients Front End |
7.8 7.8 |
62.6 62.6 |
30.4 30.4 |
32.2 32.2 |
顶级开发者1+顶级开发者2 TopDev1+TopDev2 |
8.5 8.5 |
59.3 59.3 |
26.6 26.6 |
32.7 32.7 |
TopDev2 TopDev2 |
9.5 9.5 |
61.1 61.1 |
24.2 24.2 |
36.9 36.9 |
普通的 Normal |
9.9 9.9 |
61.1 61.1 |
21.8 21.8 |
39.2 39.2 |
亚临界 Subcritical |
13.4 13.4 |
77.6 77.6 |
20.9 20.9 |
56.7 56.7 |
有了这些成本数字,你就可以绘制出如图 11-23所示的项目时间成本曲线。请注意,由于图表的缩放比例,直接成本曲线有点平坦。间接成本几乎是一条完美的直线。
With these cost numbers, you can produce the project time–cost curves shown in Figure 11-23. Note that the direct cost curve is a bit flat due to the scaling of the chart. The indirect cost is almost a perfect straight line.
图 11-23项目时间成本曲线
Figure 11-23 The project time–cost curves
图 11-23中的时间成本曲线是离散的,它们只能提示特定解决方案之外的曲线行为。但是,有了离散的时间成本曲线,您还可以找到曲线的相关模型。相关模型或趋势线是一种数学模型,它可以生成最符合离散数据点分布的曲线(Microsoft Excel 等工具可以轻松执行此类分析)。相关模型允许您在任意点绘制时间成本曲线,而不仅仅是已知的离散解。对于图 11-23中的点,这些模型是间接成本的直线,以及直接和间接成本的二次多项式。图 11-24以虚线显示了这些相关趋势线,以及它们的方程和R 2值。
The time–cost curves of Figure 11-23 are discrete, and they can only hint at the behavior of the curves outside the specific solutions. However, with the discrete time–cost curves at hand, you can also find correlation models for the curves. A correlation model or a trend line is a mathematical model that produces a curve that best fits the distribution of the discrete data points (tools such as Microsoft Excel can easily perform such analysis). Correlation models allow you to plot the time–cost curves at any point, not just at the known discrete solutions. For the points in Figure 11-23, these models are a straight line for the indirect cost, and a polynomial of the second degree for the direct and indirect costs. Figure 11-24 shows these correlation trend lines in dashed lines, along with their equations and R2 values.
图 11-24项目时间成本趋势线
Figure 11-24 The project time–cost trend lines
R 2 (也称为判定系数)是一个介于 0 和 1 之间的数字,表示模型的质量。大于 0.9 的数字表示模型与离散点的拟合度极佳。在这种情况下,项目设计解决方案范围内的方程非常精确地描绘了它们的曲线。
The R2 (also known as the coefficient of determination) is a number between 0 and 1 that represents the quality of the model. Numbers greater than 0.9 indicate an excellent fit of the model to the discrete points. In this case, the equations within the range of the project design solutions depict their curves very precisely.
图 11-24提供了示例项目中成本随时间变化的方程式。对于直接成本和间接成本,方程式如下:
Figure 11-24 provides the equations for how cost changes with time in the example project. For the direct and indirect costs, the equations are:
其中 t 以月为单位。虽然你也有一个总成本的相关性模型,但该模型是通过统计计算得出的,因此它并不是一个完美的直接成本和间接成本之和。只需将直接成本和间接成本的方程式相加,即可生成正确的总成本模型:
where t is measured in months. While you also have a correlation model for the total cost, that model is produced by a statistical calculation, so it is not a perfect sum of the direct and indirect costs. You produce the correct model for the total cost by simply adding the equations of the direct and indirect models:
图 11-25绘制了修改后的总成本相关模型以及直接和间接模型。
Figure 11-25 plots the modified total cost correlation model along with the direct and indirect models.
图片 11-25项目时间成本模型
Figure 11-25 The project time–cost models
第 9 章介绍了死亡区的概念,即时间成本曲线下的面积。任何落入该区域的项目设计解决方案都不可能实现。有了项目总成本模型(甚至是离散曲线),您就可以直观地看到项目死亡区,如图11-26所示。
Chapter 9 introduced the concept of the death zone—that is, the area under the time–cost curve. Any project design solution that falls in that area is impossible to build. Having the model (or even the discrete curve) for the project total cost enables you to visualize the project death zone, as shown in Figure 11-26.
图 11-26项目死亡地带
Figure 11-26 The project death zone
识别死亡区可以让你快速回答问题,避免承担不可能完成的项目。例如,假设管理层询问你是否可以在 4 个人的条件下在 9 个月内完成示例项目。根据项目的总成本模型,一个为期 9 个月的项目需要花费超过 60 个人月,平均需要 7 个人:
Identifying the death zone allows you to answer intelligently quick questions and avoid committing to impossible projects. For example, suppose management asks if you could build the example project in 9 months with 4 people. According to the total cost model of the project, a 9-month project costs more than 60 man-months and requires an average of 7 people:
假设平均人员与峰值人员的比例与正常解决方案相同(68%),那么在 9 个月内交付的解决方案在 10 人时达到峰值。少于 10 人会导致项目有时处于次临界状态。4 个人和 9 个月的组合(即使 100% 的时间以 100% 的效率使用)是 36 个人月的成本。该特定时间成本坐标在图 11-26中甚至不可见,因为它位于死亡区深处。您应该将这些发现呈现给管理层,并询问他们是否愿意在这些条件下做出承诺。
Assuming the same ratio of average-to-peak staffing as the normal solution (68%), a solution that delivers at 9 months peaks at 10 people. Any fewer than 10 people causes the project at times to go subcritical. The combination of 4 people and 9 months (even when utilized at 100% efficiency 100% of the time) is 36 man-months of cost. That particular time–cost coordinate is not even visible in Figure 11-26 because it is so deep within the death zone. You should present these findings to management and ask if they want to commit under these terms.
每个项目设计方案都存在一定程度的风险。使用第 10 章中描述的风险建模技术,您可以量化解决方案的风险级别,如表 11-9所示。
Each project design solution carries some level of risk. Using the risk modeling techniques described in Chapter 10, you can quantify these risk levels for the solutions, as shown in Table 11-9.
表 11-9各种方案的风险等级
Table 11-9 Risk levels of the various options
设计选项 Design Option |
持续时间(月) Duration (months) |
危急风险 Criticality Risk |
活动风险 Activity Risk |
|---|---|---|---|
模拟器 Simulators |
7.1 7.1 |
0.81 0.81 |
0.76 0.76 |
基础设施+客户端前端 Infra+Clients Front End |
7.8 7.8 |
0.77 0.77 |
0.81 0.81 |
顶级开发者1+顶级开发者2 TopDev1+TopDev2 |
8.5 8.5 |
0.79 0.79 |
0.80 0.80 |
TopDev2 TopDev2 |
9.5 9.5 |
0.70 0.70 |
0.77 0.77 |
普通的 Normal |
9.9 9.9 |
0.73 0.73 |
0.79 0.79 |
亚临界 Subcritical |
13.4 13.4 |
0.79 0.79 |
0.79 0.79 |
图 11-27绘制了项目设计方案的风险水平以及直接成本曲线。该图提供了有关风险的好消息和坏消息。好消息是,该项目中的关键风险和活动风险密切相关。当不同的模型对数字一致时,这始终是一个好兆头,使这些值可信。坏消息是,到目前为止,所有项目设计解决方案都是高风险选项;更糟糕的是,它们的价值都相似。这意味着无论采用哪种解决方案,风险都会升高且一致。另一个问题是,图 11-27包含亚临界点。亚临界解决方案绝对是一种需要避免的解决方案,您应该将其从此分析和任何后续分析中删除。
Figure 11-27 plots the risk levels of the project design options along with the direct cost curve. The figure offers both good and bad news regarding risk. The good news is that criticality risk and activity risk track closely together in this project. It is always a good sign when different models concur on the numbers, giving credence to the values. The bad news is that all project design solutions so far are high-risk options; even worse, they are all similar in value. This means the risk is elevated and uniform regardless of the solution. Another problem is that Figure 11-27 contains the subcritical point. The subcritical solution is definitely a solution to avoid, and you should remove it from this and any subsequent analysis.
图片 11-27各种方案的直接成本和风险
Figure 11-27 Direct cost and risk of the various options
总体而言,您应避免将建模建立在糟糕的设计方案之上。为了应对高风险,您应该对项目进行减压。
In general, you should avoid basing your modeling on poor design options. To address the high risk, you should decompress the project.
由于本示例项目中所有项目设计解决方案的风险都很高,因此您应该对常规解决方案进行减压,并沿关键路径注入浮动,直到风险降至可接受的水平。对项目进行减压是一个迭代过程,因为您事先并不知道要减压多少,也不知道项目对减压的响应程度如何。
Since in this example project all project design solutions are of high risk, you should decompress the normal solution and inject float along the critical path until the risk drops to an acceptable level. Decompressing a project is an iterative process because you do not know up front by how much to decompress or how well the project will respond to decompression.
有点随意的是,第一次迭代将项目压缩了 3.5 个月,从正常解决方案的 9.9 个月压缩到了亚临界解决方案的最远点。这个结果揭示了项目在整个解决方案持续时间范围内的响应情况。这样做会产生一个减压点,称为(项目总持续时间为 13.4 个月),其临界风险为 0.29,活动风险为 0.39。如第 10 章D1所述,0.3 应该是任何项目的最低风险级别,这意味着这次迭代过度地压缩了项目。
Somewhat arbitrarily, the first iteration decompresses the project by 3.5 months, from the 9.9 months of the normal solution to the furthest point of the subcritical solution. This result reveals how the project responds across the entire range of solution durations. Doing so produces a decompression point called D1 (total project duration of 13.4 months) with criticality risk of 0.29 and activity risk of 0.39. As explained in Chapter 10, 0.3 should be the lowest risk level for any project, which implies this iteration overly decompressed the project.
下一次迭代将使项目从正常持续时间中减压 2 个月,大约是减压量的一半D1。这产生了D2(项目总持续时间为 12 个月)。临界风险保持不变,为 0.29,因为这两个月的减压仍然大于该项目用于绿色活动的下限。活动风险增加到 0.49。
The next iteration decompresses the project by 2 months from the normal duration, roughly half the decompression amount of D1. This produces D2 (total project duration of 12 months). The criticality risk is unchanged from 0.29, because these 2 months of decompression are still larger than the lower limit used in this project for green activities. Activity risk is increased to 0.49.
类似地,将减压量减半,D2减压D3时间为 1 个月(项目总工期为 10.9 个月),临界风险为 0.43,活动风险为 0.62。D3将减压量减半D4,减压时间为 2 周(项目总工期为 10.4 个月),临界风险为 0.45,活动风险为 0.7。图 11-28绘制了该项目的减压风险曲线。
Similarly, halving the decompression of D2 yields D3 with a 1-month decompression (total project duration of 10.9 months), criticality risk of 0.43, and activity risk of 0.62. Half of D3 produces D4, a 2-week decompression (total project duration of 10.4 months) with criticality risk of 0.45 and activity risk at 0.7. Figure 11-28 plots the decompressed risk curves for the project.
图 11-28风险减压曲线
Figure 11-28 Risk decompression curves
图 11-28显示了两种风险模型之间的明显差距。这种差异是由于活动风险模型的局限性造成的,即当项目中的浮动不均匀分布时,活动风险模型无法正确计算风险值(有关详细信息,请参阅第 10 章)。在解压缩解决方案的情况下,测试计划和测试工具的高浮动值使活动风险值偏高。这些高浮动值与所有浮动值的平均值相差一个标准差以上,使其成为异常值。
Figure 11-28 features a conspicuous gap between the two risk models. This difference is due to a limitation of the activity risk model—namely, the activity risk model does not compute the risk values correctly when the floats in the project are not spread uniformly (see Chapter 10 for more details). In the case of the decompressed solutions, the high float values of the test plan and the test harness skew the activity risk values higher. These high float values are more than one standard deviation removed from the average of all the floats, making them outliers.
在计算减压点的活动风险时,您可以通过将异常活动的浮点数替换为所有浮点数的平均值加上所有浮点数的一个标准差来调整输入。使用电子表格,您可以轻松地自动调整异常值。这样的调整通常会使风险模型更加紧密地相关。
When computing activity risk at the decompression points, you can adjust the input by replacing the float of the outlier activities with the average of all floats plus one standard deviation of all floats. Using a spreadsheet, you can easily automate the adjustment of the outliers. Such an adjustment typically makes the risk models correlate more closely.
图 11-29显示了调整后的活动风险曲线以及关键性风险曲线。如您所见,这两个风险模型现在一致了。
Figure 11-29 shows the adjusted activity risk curve along with the criticality risk curve. As you can see, the two risk models now concur.
图 11-29危急程度和调整的活动风险减压曲线
Figure 11-29 Criticality and adjusted activity risk decompression curves
图 11-29中最重要的方面是 附近的风险临界点D4。即使将项目压缩到 ,D4风险也会大幅降低。由于D4正好处于临界点的边缘,因此您应该更保守一点,将项目压缩到 ,以D3通过曲线中的拐点。
The most important aspect of Figure 11-29 is the risk tipping point around D4. Decompressing the project even by a little to D4 decreases the risk substantially. Since D4 is right at the edge of the tipping point, you should be a bit more conservative and decompress to D3 to pass the knee in the curves.
要将减压解决方案与其他解决方案进行比较,您需要知道它们各自的成本。问题是减压点仅提供持续时间和风险。没有项目设计解决方案会产生这些点 - 它们只是具有额外浮动的正常解决方案网络的风险值。您必须从已知解决方案中推断出减压解决方案的间接和直接成本。
To compare the decompressed solutions to the other solutions, you need to know their respective cost. The problem is that the decompression points provide only the duration and the risk. No project design solutions produce these points—they are just the risk value of the normal solution network with additional float. You have to extrapolate both the indirect and direct cost of the decompressed solutions from the known solutions.
在此示例项目中,间接成本模型是一条直线,您可以安全地从其他项目设计解决方案(不包括亚临界解决方案)中推断出间接成本。例如,推断D1得出的间接成本为 51.1 人月。
In this example project, the indirect cost model is a straight line, and you can safely extrapolate the indirect cost from that of the other project design solution (excluding the subcritical solution). For example, the extrapolation for D1 yields an indirect cost of 51.1 man-months.
直接成本外推需要处理延迟的影响。额外的直接成本(超出用于创建解压缩解决方案的正常解决方案)来自更长的关键路径和非关键活动之间更长的空闲时间。由于人员配置不是完全动态或弹性的,因此当发生延迟时,通常意味着其他链上的人员处于闲置状态,等待关键活动赶上进度。
The direct cost extrapolation requires dealing with the effect of delays. The additional direct cost (beyond the normal solution that was used to create the decompressed solutions) comes from both the longer critical path and the longer idle time between noncritical activities. Because staffing is not fully dynamic or elastic, when a delay occurs, it often means people on other chains are idle, waiting for the critical activities to catch up.
在示例项目的正常解决方案中,前端之后,直接成本主要由开发人员组成。直接成本的其他贡献者是测试工程师活动和最终系统测试。由于测试工程师有非常大的浮动时间,因此您可以假设测试工程师不会受到进度延误的影响。正常解决方案的人员分布(如图 11-11所示)表明,人员配置在 3 名开发人员时达到峰值(即使该峰值也不会维持很长时间),最低只有 1 名开发人员。从表 11-3中,您可以看到正常解决方案平均使用 2.3 名开发人员。因此,您可以假设减压会影响两名开发人员。其中一名消耗了额外的减压浮动时间,而另一名最终闲置。
In the example project’s normal solution, after the front end, the direct cost mostly consists of developers. The other contributors to direct cost are the test engineer activities and the final system testing. Since the test engineer has very large float, you can assume that the test engineer will not be affected by the schedule slip. The staffing distribution for the normal solution (shown in Figure 11-11) indicates that staffing peaks at 3 developers (and even that peak is not maintained for long) and goes as low as 1 developer. From Table 11-3, you can see that the normal solution uses 2.3 developers, on average. You can therefore assume that the decompression affects two developers. One of them consumes the extra decompression float, and the other one ends up idle.
本项目的规划假设规定,活动之间的开发人员应计入直接成本。因此,当项目出现延误时,延误将增加两名开发人员的直接成本乘以正常解决方案和减压点之间的持续时间差异。对于最远的减压点D1(13.4 个月),与正常解决方案(9.9 个月)的持续时间差异为 3.5 个月,因此额外的直接成本为 7 个人月。由于正常解决方案的直接成本为 21.8 个人月,因此直接成本为D128.8 个人月。您可以通过执行类似的计算来添加其他减压点。图 11-30显示了修改后的直接成本曲线以及风险曲线。
The planning assumptions in this project stipulate that developers between activities are accounted for as a direct cost. Thus, when the project slips, the slip adds the direct cost of two developers times the difference in duration between the normal solution and the decompression point. In the case of the furthest decompression point D1 (at 13.4 months), the difference in duration from the normal solution (at 9.9 months) is 3.5 months, so the additional direct cost is 7 man-months. Since the normal solution has 21.8 man-months for its direct cost, the direct cost at D1 is 28.8 man-months. You can add the other decompression points by performing similar calculations. Figure 11-30 shows the modified direct-cost curve along with the risk curves.
图片 11-30修改后的直接成本曲线和风险曲线
Figure 11-30 Modified direct cost curve and the risk curves
使用 的新成本数字D1,您可以重建时间成本曲线,同时排除亚临界解决方案的坏数据点。这将根据可能的解决方案产生更好的时间成本曲线。然后,您可以像以前一样继续计算相关模型。此过程产生以下成本公式:
With the new cost numbers for D1, you can rebuild the time–cost curves, while excluding the bad data point of the subcritical solution. This yields a better time–cost curve based on possible solutions. You can then proceed to calculate the correlation models as before. This process produces the following cost formulas:
这些曲线的R2为 0.99,表明与数据点非常吻合。图 11-31显示了新的时间成本曲线模型以及最小总成本点和正常解决方案。
These curves have a R2 of 0.99, indicating an excellent fit to the data points. Figure 11-31 shows the new time–cost curves models as well as the points of minimum total cost and the normal solution.
图片 11-31重建的时间成本曲线模型
Figure 11-31 Rebuilt time–cost curve models
现在有了更好的总成本公式,您可以计算项目的最低总成本点。总成本模型是二阶多项式,形式如下:
With a better total cost formula now known, you can calculate the point of minimum total cost for the project. The total cost model is a second order polynomial of the form:
回想一下微积分,这样的多项式的最小点是当其一阶导数为零时:
Recall from calculus that the minimum point of such a polynomial is when its first derivative is zero:
如第 9 章所述,最小总成本点总是向正常解决方案的左侧移动。虽然最小总成本的确切解决方案未知,但第 9 章表明,对于大多数项目来说,寻找该点并不值得付出努力。相反,为了简单起见,您可以将正常解决方案的总成本等同于项目的最小总成本。在这种情况下,最小总成本为 60.3 人月,而根据模型,正常解决方案的总成本为 61.2 人月,相差 1.5%。显然,在这种情况下简化假设是合理的。如果您的目标是最小化总成本,那么正常解决方案和具有单个顶级开发人员的第一个压缩解决方案都是可行的选择。
As discussed in Chapter 9, the point of minimum total cost always shifts to the left of the normal solution. While the exact solution of minimum total cost is unknown, Chapter 9 suggested that for most projects finding that point is not worth the effort. Instead, you can, for simplicity’s sake, equate the total cost of the normal solution with the minimum total cost for the project. In this case, the minimum total cost is 60.3 man-months and the total cost of the normal solution according to the model is 61.2 man-months, a difference of 1.5%. Clearly, the simplification assumption is justified in this case. If minimizing the total cost is your goal, then both the normal solution and the first compression solution with a single top developer are viable options.
按照直接成本公式的类似步骤,您可以轻松计算出最低直接成本的时间点为 10.8 个月。理想情况下,正常解决方案也是最低直接成本点。然而,在示例项目中,正常解决方案是在 9.9 个月。这种差异部分是由于项目的离散模型和连续模型之间的差异造成的(见图11-30,其中正常解决方案也是最低直接成本,而图 11-31则为最低直接成本)。更有意义的原因是,由于重建时间成本曲线以适应风险减压点,该点已经发生了移动。在实践中,由于适应限制,正常解决方案通常会与最低直接成本点略有偏移。本章的其余部分使用 10.8 个月的持续时间作为最低直接成本的确切点。
Following similar steps with the direct cost formula, you can easily calculate the point in time of minimum direct cost as 10.8 months. Ideally, the normal solution is also the point of minimum direct cost. In the example project, however, the normal solution is at 9.9 months. The discrepancy is partially due to the differences between a discrete model of the project and the continuous model (see Figure 11-30, where the normal solution is also minimum direct cost, versus Figure 11-31). A more meaningful reason is that the point has shifted due to rebuilding the time–cost curve to accommodate the risk decompression points. In practice, the normal solution is often offset a little from the point of minimum direct cost due to accommodating constraints. The rest of this chapter uses the duration of 10.8 months as the exact point of minimum direct cost.
现在,您可以为离散风险模型创建趋势线模型,如图 11-32所示。在此图中,两条趋势线非常相似。本章的其余部分使用活动风险趋势线,因为它更为保守:它在几乎所有期权范围内都较高。
You can now create trend line models for the discrete risk models, as shown in Figure 11-32. In this figure, the two trend lines are fairly similar. The rest of the chapter uses the activity risk trend line because it is more conservative: It is higher across almost all of the range of options.
图片 11-32项目时间-风险趋势线
Figure 11-32 The project time–risk trend lines
拟合多项式相关模型,您现在得到了项目风险的公式:
Fitting a polynomial correlation model, you now have a formula for risk in the project:
以月为单位t。
where t is measured in months.
利用风险公式,您可以将风险模型与直接成本模型并排绘制出来,如图11-33所示。
With the risk formula you can plot the risk model side by side with the direct cost model, as in Figure 11-33.
图片 11-33风险和直接成本模型
Figure 11-33 Risk and direct cost models
如前所述,直接成本模型的最小值为 10.8 个月。将该时间值代入风险公式可得出风险值为 0.52;也就是说,直接成本最小点的风险为 0.52。图 11-33用蓝色虚线将其可视化。
As mentioned earlier, the minimum of the direct cost model is at 10.8 months. Substituting that time value into the risk formula yields a risk value of 0.52; that is, the risk at the point of minimum direct cost is 0.52. Figure 11-33 visualizes this with the dashed blue lines.
回想一下第 10 章,理想情况下,最低直接成本应为 0.5 风险,并且此点是推荐的减压目标。示例项目与该目标相差 4%。虽然该项目没有持续时间恰好为 10.8 个月的项目设计解决方案,但已知的减压点很接近,持续时间为 10.9 个月(参见图 11-33D3中的虚线红线)。从实际意义上讲,这些点是相同的。
Recall from Chapter 10 that ideally the minimum direct cost should be at 0.5 risk and that this point is the recommended decompression target. The example project is off that mark by 4%. While this project does not have project design solution with a duration of exactly 10.8 months, the known D3 decompression point comes close, with a duration of 10.9 months (see the dashed red lines in Figure 11-33). In a practical sense, these points are identical.
的风险模型值为D30.50,这意味着这D3是风险减压的理想目标,也是实际直接成本最低的点。这使得D3就直接成本、工期和风险而言,这是最优点。总成本D3仅为 63.8 人月,几乎与最低总成本相同。这也使其成为D3总成本、工期和风险方面的最佳点。
The risk model value at D3 is at 0.50, meaning that D3 is the ideal target of risk decompression as well as practically the point of minimum direct cost. This makes D3 the optimal point in terms of direct cost, duration, and risk. The total cost at D3 is only 63.8 man-months, virtually the same as the minimum total cost. This also makes D3 the optimal point in terms of total cost, duration, and risk.
处于最优点意味着项目设计方案最有可能实现计划的承诺(成功的定义)。你应该始终努力围绕最小直接成本点进行设计。图 11-34显示了项目网络在 时的浮动情况D3。如你所见,网络是一幅健康的图景。
Being the optimal point means that the project design option has the highest probability of delivering on the commitments of the plan (the very definition of success). You should always strive to design around the point of minimum direct cost. Figure 11-34 visualizes the float of the project network at D3. As you can see, the network is a picture of health.
图片 11-34最优设计方案的浮动分析
Figure 11-34 Float analysis of the optimal design solution
使用风险公式,您还可以计算出最小风险点。该点位于 12.98 个月,风险值为 0.248。第 10 章解释了关键性风险模型的最小风险值为 0.25(使用权重 [1、2、3、4])。虽然 0.248 非常接近 0.25,但它是使用活动风险公式得出的,与关键性风险模型不同,该公式不受权重选择的影响。
Using the risk formula, you can also calculate the point of minimum risk. This point comes at 12.98 months and a risk value of 0.248. Chapter 10 explained that the minimum risk value for the criticality risk model is 0.25 (using the weights [1, 2, 3, 4]). While 0.248 is very close to 0.25, it was produced using the activity risk formula, which, unlike the criticality risk model, is unaffected by the choice of weights.
图 11-29的离散风险曲线表明,虽然压缩缩短了项目,但并不一定会大幅增加风险。压缩此示例项目甚至在活动风险曲线上稍微降低了风险。风险的主要增加是由于向左移动D3(或最低直接成本),并且所有压缩解决方案都具有高风险。
The discrete risk curve of Figure 11-29 indicates that while compression shortens the project, it does not necessarily increase the risk substantially. Compressing this example project even reduced the risk a bit on the activity risk curve. The main increase in risk was due to moving left of D3 (or minimum direct cost), and all compressed solutions have high risk.
图 11-35通过使用风险模型显示了所有项目设计方案如何映射到项目的风险曲线。你可以看到,第二个压缩方案几乎处于最大风险,而压缩程度更高的方案具有预期的降低风险水平(第 10 章中介绍的达芬奇效应)。显然,将任何东西设计到最大风险点或超过最大风险点都是不明智的。你甚至应该避免接近项目的最大风险点——但这个临界点在哪里?示例项目在风险曲线上的最大风险值为 0.85,因此接近该数字的项目设计方案不是好选择。第 10 章建议将 0.75 作为任何解决方案的最大风险水平。如果风险高于 0.75,则项目通常很脆弱,并且可能会错过进度。
Using the risk model, Figure 11-35 shows how all the project design solutions map to the risk curve of the project. You can see that the second compressed solution is almost at maximum risk, and that the more compressed solutions have the expected decreased level of risk (the da Vinci effect introduced in Chapter 10). Obviously, designing anything to or past the point of maximum risk is ill advised. You should avoid even approaching the point of maximum risk for the project—but where is that cutoff point? The example project has a maximum risk value of 0.85 on the risk curve, so project design solutions approaching that number are not good options. Chapter 10 suggested 0.75 as the maximum level of risk for any solution. With risk higher than 0.75, the project is typically fragile and likely to slip the schedule.
图 11-35所有设计方案及风险
Figure 11-35 All design solutions and risk
使用风险公式,您会发现 0.75 风险点的持续时间为 9.49 个月。虽然没有项目设计解决方案完全符合这一点,但第一个压缩点的持续时间为 9.5 个月,风险为 0.75。这表明,在这个示例项目中,第一次压缩是实际上限。如前所述,0.3 应该是最低风险水平,这不包括D10.27 风险的减压点。0.32D2的减压点是可能的,但处于临界值。
Using the risk formula, you will find that the point of 0.75 risk is at duration of 9.49 months. While no project design solution exactly matches this point, the first compression point has a duration of 9.5 months and risk of 0.75. This suggests that the first compression is the upper practical limit in this example project. As discussed previously, 0.3 should be the lowest level of risk, which excludes the D1 decompression point at 0.27 risk. The D2 decompression point at 0.32 is possible, but borderline.
所有详细的项目设计工作最终都会在 SDP 评审中结束,在此过程中,您将向决策者展示项目设计选项。您不仅应该推动明智的决策,还应该让正确的选择显而易见。最好的项目迄今为止的设计方案是D3,为期一个月的减压计划,提供最低成本和 0.50 的风险。
All the detailed project design work culminates in the SDP review, where you present the project design options to the decision makers. You should not only drive educated decisions, but also make the right choice obvious. The best project design option so far was D3, a one-month decompression offering both minimum cost and risk of 0.50.
在向决策者展示结果时,列出第一个压缩点、正常解决方案以及从正常解决方案开始的最佳一个月减压。表 11-10总结了这些可行的项目设计选项。
When presenting your results to decision makers, list the first compression point, the normal solution, and the optimal one-month decompression from the normal solution. Table 11-10 summarizes these viable project design options.
表 11-10可行的项目设计选项
Table 11-10 Viable project design options
设计选项 Design Option |
持续时间(月) Duration (months) |
总成本(人月) Total Cost (man-months) |
风险 Risk |
|---|---|---|---|
一位顶级开发商 One Top Developer |
9.5 9.5 |
61.1 61.1 |
0.75 0.75 |
正常解决方案 Normal Solution |
9.9 9.9 |
61.1 61.1 |
0.68 0.68 |
一个月减压计划 One-Month Decompression |
10.9 10.9 |
63.8 63.8 |
0.50 0.50 |
您实际上不应该像表 11-10中那样提供原始信息。不太可能有人见过这种精度水平,因此您缺乏可信度。您还应该添加次临界解决方案。由于次临界解决方案最终成本更高(并且有风险且耗时更长),您希望尽早消除这种想法。
You should not actually present the raw information as shown in Table 11-10. It is unlikely that anyone has ever seen this level of precision, so you will lack credibility. You should also add the subcritical solution. Since the subcritical solution ends up costing more (as well as being risky and taking longer), you want to dispel such notions as early as possible.
表 11-11列出了您应在 SDP 评审中提出的项目选项。请注意四舍五入的进度和成本数字。四舍五入有点夸张,以创建更明显的差异。虽然这不会改变决策过程,但它确实为数字提供了更多的可信度。
Table 11-11 lists the project options you should present at the SDP review. Note the rounded schedule and cost numbers. The rounding was performed with a bit of a license to create a more prominent spread. While this will not change anything in the decision-making process, it does lend more credibility to the numbers.
表 11-11待审查的项目设计选项
Table 11-11 Project design options for review
项目选项 Project Option |
持续时间(月) Duration (months) |
总成本(人月) Total Cost (man-months) |
风险 Risk |
|---|---|---|---|
一位顶级开发商 One Top Developer |
9 9 |
61 61 |
0.75 0.75 |
正常解决方案 Normal Solution |
10 10 |
62 62 |
0.68 0.68 |
一个月减压计划 One-Month Decompression |
11 11 |
64 64 |
0.50 0.50 |
亚临界人员配置 Subcritical Staffing |
十三 13 |
77 77 |
0.78 0.78 |
风险数字没有四舍五入,因为风险是评估选项的最佳方式。几乎可以肯定的是,决策者从未见过风险量化形式作为推动明智决策的工具。你必须向他们解释风险值是非线性的;也就是说,使用表 11-11中的数字,0.68 风险比 0.5 风险风险高得多,而不仅仅是增加 36%。为了说明非线性行为,你可以将风险与更熟悉的非线性域(里氏地震强度)进行类比。如果风险数字是里氏地震的等级,那么 6.8 级地震的威力是 5.0 级地震的 500 倍,7.5 级地震的威力是 5623 倍。这种简单的类比将决策引向期望的 0.50 风险点。
The risk numbers are not rounded because risk is the best way of evaluating the options. It is nearly certain that the decision makers have never seen risk in a quantified form as a tool for driving educated decisions. You must explain to them that the risk values are nonlinear; that is, using the numbers from Table 11-11, 0.68 risk is a lot more risky than 0.5 risk, not a mere 36% increase. To illustrate nonlinear behavior, you can use an analogy between risk and a more familiar nonlinear domain, the Richter scale for earthquake strength. If the risk numbers were levels of earthquakes on the Richter scale, an earthquake of 6.8 is 500 times more powerful than an earthquake of 5.0 magnitude, and a 7.5 quake is 5623 times more powerful. This sort of plain analogy steers the decision toward the desired point of 0.50 risk.
项目设计是一门复杂的学科,前面的章节只介绍了基本概念。这是经过深思熟虑的,以便让学习曲线适中。项目设计还有更多内容,在本章中,您将发现几乎任何项目都适用的其他技术,而不仅仅是非常大或非常复杂的项目。这些技术的共同点是,它们可以让您更好地处理风险和复杂性。您还将了解如何成功处理最具挑战性和最复杂的项目。
Project design is an intricate discipline, and the previous chapters covered only the basic concepts. This was deliberate, to allow for a moderate learning curve. There is more to project design, and in this chapter you will find additional techniques useful in almost any project, not just the very large or the very complex. What these techniques have in common is that they give you a better handle on risk and complexity. You will also see how to successfully handle even the most challenging and complex projects.
顾名思义,上帝活动是指对于您的项目来说规模过大的活动。“过大”可能是一个相对术语,即某个上帝活动相对于项目中的其他活动而言规模过大。判断此类上帝活动的一个简单标准是,该活动的持续时间与项目中所有活动的平均持续时间至少相差一个标准差。但上帝活动在绝对意义上可能过大。对于典型的软件项目而言,持续时间在 40-60 天(或更长)范围内的活动规模过大。
As the name implies, god activities are activities too large for your project. “Too large” could be a relative term, when a god activity is too large with respect to other activities in the project. A simple criterion for such a god activity is having an activity with a duration that differs by at least one standard deviation from the average duration of all the activities in the project. But god activities can be too large in absolute respect. Durations in the 40–60 days range (or longer) are too large for a typical software project.
你的直觉和经验可能已经告诉你要避免这样的活动。通常,上帝活动只是隐藏在幕后的巨大不确定性的占位符。上帝活动的持续时间和工作量估计几乎总是低级的。因此,实际活动持续时间可能更长,可能足以使项目脱轨。你应该尽快面对这样的危险,以确保你有机会履行承诺。
Your intuition and experience may already tell you to avoid such activities. Typically, god activities are mere placeholders for some great uncertainty lurking under the cover. The duration and effort estimation for a god activity is almost always low-grade. Consequently, the actual activity duration may be longer, potentially enough to derail the project. You should confront such dangers as soon as possible to ensure that you have a chance of meeting your commitments.
上帝活动也倾向于扭曲本书中展示的项目设计技术。它们几乎总是关键路径的一部分,使得大多数关键路径管理技术无效,因为关键路径的持续时间及其在网络中的位置倾向于上帝活动。更糟糕的是,具有上帝活动的项目的风险模型会导致误导性的风险数字。此类项目中的大部分努力将花在关键的上帝活动,使得该项目在所有实际目的上都是高风险项目。但是,风险计算将会偏低,因为围绕关键上帝活动的其他活动将具有高浮动。如果删除这些卫星活动,风险数字将飙升至 1.0,正确表明上帝活动导致的高风险。
God activities also tend to deform the project design techniques shown in this book. They are almost always part of the critical path, rendering most critical path management techniques ineffective because the critical path’s duration and its position in the network gravitate toward the god activities. To make matters worse, the risk models for projects with god activities result in misleadingly low risk numbers. Most of the effort in such a project will be spent on the critical god activities, making the project for all practical purposes a high-risk project. However, the risk calculation will be skewed lower because the other activities orbiting the critical god activities will have high float. If you removed these satellite activities, the risk number would shoot up toward 1.0, correctly indicating the high risk resulting from the god activities.
处理神级活动的最佳方法是将其分解为较小的独立活动。细分神级活动将显著提高估算质量、降低不确定性并提供正确的风险值。但如果工作范围真的很大怎么办?您应该将这些活动视为小型项目并对其进行压缩。首先确定神级活动的内部阶段,然后找到在每个神级活动内部跨这些阶段并行工作的方法。如果这不可能,您应该寻找方法,使神级活动不再妨碍项目中的其他活动,从而降低其重要性。
The best course of action with god activities is to break them down into smaller independent activities. Subdividing god activities will markedly improve the quality of the estimation, reduce the uncertainty, and provide the correct risk value. But what if the scope of work is truly huge? You should treat such activities as mini-projects and compress them. Start by identifying internal phases of the god activities and finding ways of working in parallel across these phases inside each god activity. If that is not possible, you should look for ways of making the god activities less critical by getting them out of the way of the other activities in the project.
例如,为神活动开发模拟器可以减少其他活动对神活动本身的依赖。这将使工作能够与神活动并行,使神活动变得不那么重要(甚至根本不重要)。模拟器还可以通过对神活动施加限制来揭示隐藏的假设,从而减少神活动的不确定性,使神活动的详细设计更容易。
For instance, developing simulators for the god activities reduces other activities’ dependencies on the god activities themselves. This will enable working in parallel to the god activities, making the god activities less (or maybe not at all) critical. Simulators also reduce the uncertainty of the god activities by placing constraints on them that reveal hidden assumptions, making the detailed design of the god activities easier.
您还应该考虑将神级活动分解成单独的副项目的方法。分解成副项目非常重要,特别是当神级活动的内部阶段本质上是连续的时。这使得项目管理和进度跟踪变得更加容易。您必须沿着网络设计集成点,以降低最后的集成风险。以这种方式提取神级活动往往会增加项目其余部分的风险(一旦提取了神级活动,其他活动的浮动就会大大减少)。这通常是一件好事,因为否则项目的风险数字会低得令人难以置信。这种情况非常常见,以至于低风险数字通常是寻找神级活动的信号。
You should also consider ways of factoring the god activities into separate side projects. Factoring into a side project is important especially if the internal phases of the god activity are inherently sequential. This makes project management and progress tracking much easier. You must design integration points along the network to reduce the integration risk at the end. Extracting the god activities this way tends to increase the risk in the rest of the project (the other activities have much less float once the god activities are extracted). This is typically a good thing because the project would otherwise have deceptively low risk numbers. This situation is so common that low risk numbers are often a signal to look for god activities.
第 11 章中的案例研究使用了风险低于 0.75 和高于 0.3 的简单指导方针来包含和排除项目设计选项。在决定项目设计选项时,您可以比一般的经验法则更精确。
The case study in Chapter 11 used the simple guidelines of risk lower than 0.75 and higher than 0.3 to include and exclude project design options. You can be more precise than general rules of thumb when deciding on your project design options.
在图 11-33中,在最小直接成本点及其左侧,直接成本曲线基本平坦,但风险曲线陡峭。这是预期行为,因为风险曲线通常在直接成本达到最大值之前达到最大值,且解决方案压缩程度最高。在最大直接成本之前实现最大风险的唯一方法是,最初,在最小直接成本的左侧,风险曲线上升速度比直接成本曲线快得多。在最大风险点(及其右侧一点),风险曲线平坦或几乎平坦,而直接成本曲线相当陡峭。
In Figure 11-33, at the point of minimum direct cost and immediately to its left, the direct cost curve is basically flat, but the risk curve is steep. This is an expected behavior because the risk curve typically reaches its maximum value before the direct cost reaches its maximum value with the most compressed solutions. The only way to achieve maximum risk before maximum direct cost is if, initially, left of minimum direct cost, the risk curve rises much faster than the direct cost curve. At the point of maximum risk (and a bit to its right), the risk curve is flat or almost flat, while the direct cost curve is fairly steep.
因此,在最小直接成本的剩余点处,风险曲线的上升速度必定会停止快于直接成本曲线。我把这个点称为风险交叉点。在交叉点处,风险接近其最大值。这表明您可能应该避免使用风险值高于交叉点的压缩解决方案。在大多数项目中,风险交叉点将与风险曲线上的 0.75 值相重合。
It follows that there must be a point left of minimum direct cost where the risk curve stops rising faster than the direct cost curve. I call that point the risk crossover point. At the crossover point, the risk approaches its maximum. This indicates you should probably avoid compressed solutions with risk values above the crossover. In most projects, the risk crossover point will coincide with the value of 0.75 on the risk curve.
风险交叉点是一个保守点,因为它不是最大风险,而且它基于风险和直接成本的行为,而不是风险的绝对值。话虽如此,考虑到大多数软件项目的记录,谨慎一点永远不是坏事。
The risk crossover point is a conservative point both because it is not at maximum risk and because it is based on the behavior of the risk and direct cost, rather than an absolute value of risk. That said, given the track record of most software projects, a bit of caution is never a bad thing.
找到风险交叉点需要比较直接成本曲线和风险曲线的增长率。您可以使用一些基本微积分、在电子表格中以图形方式或使用数值方程求解器进行分析。本章附带的文件几乎以模板方式包含所有这三种技术,以便您轻松找到风险交叉点。
Finding the risk crossover point requires comparing the rate of growth of the direct cost curve and the risk curve. You can do that analytically using some basic calculus, graphically in a spreadsheet, or using a numerical equation solver. The files accompanying this chapter contain all three techniques almost in a template manner so that you can easily find the risk crossover point.
曲线的增长率由其一阶导数表示,因此您必须将风险曲线的一阶导数与直接成本曲线的一阶导数进行比较。第 11 章示例项目中的风险模型采用三次多项式的形式,其形式如下:
The rate of growth of a curve is expressed by its first derivative, so you have to compare the first derivative of the risk curve with the first derivative of the direct cost curve. The risk model in the example project of Chapter 11 is in the form of a polynomial of the third degree with the following form:
该多项式的一阶导数采用以下二阶多项式的形式:
The first derivative of that polynomial is in the form of this second-degree polynomial:
对于示例项目,风险公式为:
With the example project, the risk formula is:
因此,风险的一阶导数为:
Therefore, the first derivative of the risk is:
对于示例项目,直接成本公式为:
With the example project, the direct cost formula is:
因此,直接成本的一阶导数为:
Therefore, the first derivative of the direct cost is:
在比较两个导数方程之前,您需要克服两个问题。第一个问题是两条曲线中最大风险和最小直接成本之间的值范围是单调递减的(这意味着两条曲线的增长率将为负数),因此您必须比较增长率的绝对值。第二个问题是原始增长率的幅度不相容。风险值的范围在 0 到 1 之间,而示例项目的成本值约为 30。要正确比较这两个导数,您必须首先将风险值缩放到最大风险点的成本值。
There are two issues you need to overcome before you can compare the two derivative equations. The first issue is that the ranges of values between maximum risk and minimum direct cost in both curves are monotonically decreasing (meaning the rates of growth of the two curves will be negative numbers), so you must compare the absolute values of the rates of growth. The second issue is that the raw rates of growth are incompatible in magnitude. The risk values range between 0 and 1, while the cost values are approximately 30 for the example project. To correctly compare the two derivatives, you must first scale the risk values to the cost values at the point of maximum risk.
建议的比例因子如下
The recommended scaling factor is given by
在哪里:
where:
tmr是风险最大的时刻。
tmr is the time for maximum risk.
R(tmr)是项目在 时的风险公式值。tmr
R(tmr) is the project’s risk formula value at tmr.
C(tmr)是项目的成本公式值。tmr
C(tmr) is the project’s cost formula value at tmr.
当风险曲线的一阶导数 ' 为零时,风险曲线最大化。当' = 0时R,求解项目风险方程可得出8.3 个月的 a。相应的风险值为0.85,相应的直接成本值为tRtmrR28 个人月。这两个值之间的比率F为 32.93,即示例项目的比例因子。
The risk curve is maximized when the first derivative of the risk curve, R', is zero. Solving the project’s risk equation for t when R' = 0 yields a tmr of 8.3 months. The corresponding risk value, R, is 0.85, and the corresponding direct cost value is 28 man-months. The ratio between these two values, F, is 32.93, the scaling factor for the example project.
当满足以下所有条件时,项目的风险水平即可达到可接受水平:
The acceptable risk level for the project occurs when all of the following conditions are met:
时间位于项目最小风险点的左边。
Time is to the left of the point of minimum risk of the project.
时间位于项目最大风险点的右侧。
Time is to the right of the point of maximum risk of the project.
风险的绝对值增长速度比成本的增长速度要快。
Risk rises faster than cost in absolute value to scale.
您可以按照以下表达式的形式将这些条件组合在一起:
You can put these conditions together in the form of this expression:
使用风险和直接成本导数方程以及比例因子得出:
Using the equations for the risk and direct cost derivatives as well as the scaling factor yields:
解该方程可得出可接受的范围t:
Solving the equation provides the acceptable range for t:
结果不是一个,而是两个交叉点,分别位于 9.03 个月和 12.31 个月。图 12-1直观地显示了缩放风险和成本导数的绝对值行为。您可以清楚地看到,绝对值风险导数在两个地方与绝对值成本导数交叉(因此是交叉点)。
The result is not one, but two crossover points, at 9.03 and 12.31 months. Figure 12-1 visualizes the behavior of the scaled risk and cost derivatives in absolute value. You can clearly see that the risk derivative in absolute value crosses over the cost derivative in absolute value in two places (hence crossover points).
图 12-1风险交叉点
Figure 12-1 Risk crossover points
抛开数学不谈,之所以有两个风险交叉点,与项目设计角度的语义有关。在 9.03 个月时,风险为 0.81;在 12.31 个月时,风险为 0.28。将这些值叠加在图 12-2中的风险曲线和直接成本曲线上,揭示了交叉点的真正含义。
Math aside, the reason why there are two risk crossover points has to do with the semantics of the points from a project design perspective. At 9.03 months, the risk is 0.81; at 12.31 months, the risk is 0.28. Superimposing these values on the risk curve and the direct cost curve in Figure 12-2 reveals the true meaning of the crossover points.
图 12-2风险包含区和排除区
Figure 12-2 Risk inclusion and exclusion zones
9.03 个月风险交叉点左侧的项目设计解决方案风险过高。12.31 个月风险交叉点右侧的项目设计解决方案风险过低。在两个风险交叉点之间,风险“恰到好处”。
Project design solutions to the left of the 9.03-month risk crossover point are too risky. Project design solutions to the right of the 12.31-month risk crossover point are too safe. In between the two risk crossover points, the risk is “just right.”
交叉点处的风险值 0.81 和 0.28 与经验法则 0.75 和 0.30 非常吻合。对于示例项目,可接受的风险区域包括第一个压缩解决方案、正常解决方案以及 、 和 的减压点D4(D3见图D211-35 )。所有这些点都是实用的设计选项。此处的“实用”是指该项目有合理的机会履行其承诺。压缩程度更高的解决方案风险太大,而该D1点太安全。您可以通过找到最佳减压目标来进一步在减压点之间进行选择。
The risk values at the crossover points of 0.81 and 0.28 agree closely with the rules of thumb of 0.75 and 0.30. For the example project, the acceptable risk zone includes the first compressed solution, the normal solution, and the decompression points of D4, D3, and D2 (see Figure 11-35). All of these points are practical design options. “Practical” in this context means the project stands a reasonable chance of meeting its commitments. The more compressed solutions are too risky, and the D1 point is too safe. You can further select between the decompression points by finding the best decompression target.
正如第 10 章所指出的,0.5 级风险是风险曲线中最陡峭的点。这使得它成为理想的减压目标,因为它提供了最佳回报——也就是说,用最少的减压量,你可以最大限度地降低风险。这个理想点是风险的临界点,因此它是减压的最小点。
As Chapter 10 pointed out, the risk level of 0.5 is the steepest point in the risk curve. This makes it the ideal decompression target because it offers the best return—that is, for the least amount of decompression, you get the most reduction in risk. This ideal point is the tipping point of risk, and therefore it is the minimum point of decompression.
如果您绘制了风险曲线,您就能看到临界点的位置,如果有临界点,请选择临界点处的减压点,或者更保守地选择临界点右侧的减压点。第 11 章使用这种技术推荐图 11-29D3中的减压目标。但是,仅仅目测图表并不是一个好的工程实践。相反,您应该应用初等微积分以一致和客观的方式确定减压目标。
If you have plotted the risk curve, you can see where that tipping point is located, and if you have one, select a decompression point at the tipping point, or more conservatively, to its right. This technique was used in Chapter 11 to recommend D3 in Figure 11-29 as the decompression target. However, merely eyeballing a chart is not a good engineering practice. Instead, you should apply elementary calculus to identify the decompression target in a consistent and objective manner.
鉴于风险曲线模拟标准逻辑函数(至少在最小风险和最大风险之间),曲线中最陡的点也标志着曲线的转折点或拐点。在该点左侧,风险曲线是凹的,在该点右侧,风险曲线是凸的。微积分告诉我们,在拐点处,即凹变为凸的地方,曲线的二阶导数为零。图 12-3以图形方式显示了理想的风险曲线及其前两个导数。
Given that the risk curve emulates a standard logistic function (at least between minimum and maximum risk), the steepest point in the curve also marks a twist or inflection point in the curve. To the left of that point the risk curve is concave, and to the right of it the risk curve is convex. Calculus tells us that at the inflection point, where concave becomes convex, the second derivative of the curve is zero. The ideal risk curve and its first two derivatives are shown graphically in Figure 12-3.
图 12-3拐点作为减压目标
Figure 12-3 The inflection point as decompression target
使用第 11 章中的示例项目来演示此技术,您将风险方程作为三次多项式。其一阶和二阶导数为:
Using the example project from Chapter 11 to demonstrate this technique, you have the risk equation as polynomial of the third degree. Its first and second derivatives are:
将二阶导数等于零可得到以下公式:
Equating the second derivative to zero provides this formula:
由于风险模型是:
Since the risk model is:
二阶导数为零的点在10.62个月处:
the point at which the second derivative is zero is at 10.62 months:
在 10.62 个月时,风险值为 0.55,与理想目标 0.5 仅相差 10%。当绘制在图 12-4中的离散风险曲线上时,您可以看到该值正好介于D4和之间,证实了第 11 章D3中选择作为减压目标。D3
At 10.62 months, the risk value is 0.55, which differs only 10% from the ideal target of 0.5. When plotted on the discrete risk curves in Figure 12-4, you can see that this value falls right between D4 and D3, substantiating the choice in Chapter 11 of D3 as the decompression target.
图 12-4风险曲线上的减压目标
Figure 12-4 Decompression target on the risk curves
与第 11 章使用风险图表可视化和判断来确定临界点不同,二阶导数提供了客观且可重复的标准。当没有立即明显的视觉风险临界点或风险曲线偏高或偏低时,这一点尤其重要,因为 0.5 准则无法使用。
Unlike in Chapter 11, which used visualization of the risk chart and a judgment call to identify the tipping point, the second derivative provides an objective and repeatable criterion. This is especially important when there is no immediately obvious visual risk tipping point or when the risk curve is skewed higher or lower, making the 0.5 guideline unusable.
第 10 章中介绍的风险模型都使用浮点数的算术平均值来计算风险。不幸的是,算术平均值不能很好地处理值的不均匀分布。例如,考虑序列 [1, 2, 3, 1000]。该序列的算术平均值为 252,根本不能很好地表示序列中的值。这种行为并非风险计算所独有,在分布非常不均匀的情况下使用算术平均值的任何尝试都将产生不令人满意的结果。在这种情况下,最好使用几何平均值而不是算术平均值。
The risk models presented in Chapter 10 all use a form of arithmetic mean of the floats to calculate the risk. Unfortunately, the arithmetic mean handles an uneven distribution of values poorly. For example, consider the series [1, 2, 3, 1000]. The arithmetic mean of that series is 252, which does not represent the values in the series well at all. This behavior is not unique to risk calculations, and any attempt at using an arithmetic mean in the face of very uneven distribution will yield an unsatisfactory result. In such a case it is better to use a geometric rather than an arithmetic mean.
一系列值的几何平均值是将一系列n值中的所有值相乘,然后取n乘积的根。给定一系列值,该系列的几何平均值为:a1an,
The geometric mean of a series of values is the product of multiplying all the values in the series of n values and then taking the nth root of the multiplication. Given a series of values a1 to an, the geometric mean of that series would be:
例如,序列 [2, 4, 6] 的算术平均值为 4,而几何平均值为 3.63:
For example, while the arithmetic mean of the series [2, 4, 6] is 4, the geometric mean is 3.63:
几何平均值总是小于或等于同一系列值的算术平均值:
The geometric mean is always less or equal to arithmetic mean of the same series of values:
仅当系列中的所有值相同时,两个平均值才相等。
The two means are equal only when all values in the series are identical.
虽然几何平均数最初看起来像代数中的怪异现象,但当涉及到值分布不均匀时,它就会大放异彩。在几何平均数计算中,极端异常值对结果的影响要小得多。对于 [1, 2, 3, 1000] 的示例系列,几何平均值为 8.8,可以更好地表示系列中的前三个数字。
While initially the geometric mean looks like an algebraic oddity, it shines when it comes to an uneven distribution of values. In the geometric mean calculation, extreme outliers have much less effect on the result. For the example series of [1, 2, 3, 1000], the geometric mean is 8.8 and is a better representation of the first three numbers in the series.
与算术关键性风险一样,您可以使用浮点颜色编码和相应的活动数量来计算几何关键性风险。您无需将浮点权重乘以活动数量,而是将其乘以该幂。几何关键性公式为:
As with the arithmetic criticality risk, you can use the float color coding and the corresponding number of activities to calculate the geometric criticality risk. Instead of multiplying the float weight by the number of activities, you raise it to that power. The geometric criticality formula is:
在哪里:
where:
WC是关键活动的权重。
WC is the weight of critical activities.
WR是红色活动的权重。
WR is the weight of red activities.
WY是黄色活动的权重。
WY is the weight of yellow activities.
WG是绿色活动的权重。
WG is the weight of green activities.
NC是关键活动的数量。
NC is the number of critical activities.
NR是红色活动的数量。
NR is the number of red activities.
NY是黄色活动的数量。
NY is the number of yellow activities.
NG是绿色活动的数量。
NG is the number of green activities.
N是项目中的活动数。(N = NC + NR + NY + NG)
N is the number of activities in the project (N = NC + NR + NY + NG).
使用图 10-4的示例网络,几何临界风险为:
Using the example network of Figure 10-4, the geometric criticality risk is:
同一网络的相应算术临界风险为 0.69。如预期的那样,几何临界风险略低于算术临界风险。
The corresponding arithmetic criticality risk for the same network is 0.69. As expected, the geometric criticality risk is slightly lower than the arithmetic criticality risk.
与算术临界风险一样,几何临界风险在所有活动都临界时具有最大值 1.0,在网络中的所有活动都为绿色时具有最小值 1.0 :WGWC
Like the arithmetic criticality risk, the geometric criticality risk has the maximum value of 1.0 when all activities are critical and a minimum value of WG over WC when all activities in the network are green:
您可以使用临界权重之间的斐波那契比率来生成几何斐波那契风险模型。给出以下权重定义:
You can use the Fibonacci ratio between criticality weights to produce the geometric Fibonacci risk model. Given this definition of weights:
几何斐波那契公式为:
The geometric Fibonacci formula is:
与算术斐波那契风险一样,当所有活动都至关重要时,几何斐波那契风险具有最大值 1.0,当网络中的所有活动都为绿色时,几何斐波那契风险具有最小值 0.24 ( φ -3 )。
Like the arithmetic Fibonacci risk, the geometric Fibonacci risk has the maximum value of 1.0 when all activities are critical and a minimum value of 0.24 (φ –3) when all activities in the network are green.
几何活动风险公式使用项目中浮动时间的几何平均值。关键活动的浮动时间为零,这会产生问题,因为几何平均值始终为零。常见的解决方法是将系列中的所有值加 1,然后从得到的几何平均值中减去 1。
The geometric activity risk formula uses a geometric mean of the floats in the project. Critical activities have zero float, which creates a problem because the geometric mean will always be zero. The common workaround is to add 1 to all values in the series and subtract 1 from the resulting geometric mean.
因此,几何活动风险公式为:
The geometric activity risk formula is therefore:
在哪里:
where:
Fi是活动浮动时间i。
Fi is the float of activity i.
N是项目中的活动数。
N is the number of activities in the project.
M是项目中任何活动的最大浮动时间或 Max( )。F1, F2, …, FN
M is the maximum float of any activity in the project or Max(F1, F2, …, FN).
使用图 10-4的示例网络,几何活动风险将是:
Using the example network of Figure 10-4, the geometric activity risk would be:
同一网络对应的算术活动风险为0.67。
The corresponding arithmetic activity risk for the same network is 0.67.
随着越来越多的活动变得至关重要,几何活动模型的最大值趋近于 1.0,但当所有活动都至关重要时,该最大值则不确定。当所有活动的浮动水平相同时,几何活动风险的最小值为 0。与算术活动风险不同,几何活动风险不需要调整异常高浮动的异常值,浮动也不需要均匀分布。
The maximum value of the geometric activity model approaches 1.0 as more activities become critical, but it is undefined when all activities are critical. The geometric activity risk has a minimum value of 0 when all activities have the same level of float. Unlike the arithmetic activity risk, with the geometric activity risk there is no need to adjust outliers of abnormally high float, and the floats do not need to be uniformly spread.
几何临界风险和几何斐波那契风险模型得出的结果与算术模型非常相似。然而,几何活动公式与其算术同类公式的跟踪效果不佳,其值在整个范围内要高得多。结果是几何活动风险值通常不符合本书提供的风险值指南。
Both the geometric criticality risk and the geometric Fibonacci risk models yield results that are very similar to their arithmetic counterparts. However, the geometric activity formula does not track well with its arithmetic kin, and its value is much higher across the range. The result is that the geometric activity risk values typically do not conform to the risk value guidelines provided in this book.
图 12-5绘制了第 11 章中示例项目的所有风险曲线,从而说明了几何风险模型之间的行为差异。
Figure 12-5 illustrates the difference in behavior between the geometric risk models by plotting all of the risk curves of the example project from Chapter 11.
图片 12-5几何风险模型与算术风险模型
Figure 12-5 Geometric versus arithmetic risk models
您可以看到,几何临界性和几何斐波那契风险具有与算术模型相同的总体形状,只是略低一些,正如预期的那样。您可以清楚地观察到相同的风险临界点。几何活动风险大大升高,其行为与算术活动风险截然不同。没有容易辨别的风险临界点。
You can see that the geometric criticality and geometric Fibonacci risk have the same general shape as the arithmetic models, only slightly lower, as expected. You can clearly observe the same risk tipping point. The geometric activity risk is greatly elevated, and its behavior is very different from the arithmetic activity risk. There is no easily discernable risk tipping point.
算术和几何临界性(以及斐波那契)风险模型几乎相同的行为表明,使用哪种模型并不重要。这些差异并不能证明为项目构建另一条风险曲线所花费的时间和精力是合理的。如果有什么不同的话,只是为了在向他人解释风险建模时简单起见,你应该选择算术模型。几何活动风险显然不如算术活动风险有用,但它在一种情况下的实用性正是我决定讨论几何风险的原因。
The near-identical behavior of the arithmetic and geometric criticality (as well as Fibonacci) risk models illustrates that it does not matter much which one you use. The differences do not justify the time and effort involved in building yet another risk curve for the project. If anything, just for the sake of simplicity when explaining risk modeling to others, you should choose the arithmetic model. The geometric activity risk is clearly less useful than the arithmetic activity risk, but its utility in one case is why I decided to discuss geometric risk.
几何活动风险是计算具有神级活动的项目风险的最后手段。这样的项目实际上具有非常高的风险,因为大部分精力都花在了关键的神级活动上。如前所述,由于神级活动的规模,其他活动具有相当大的浮动空间,这反过来又降低了算术风险,给您一种虚假的安全感。相比之下,几何活动风险模型为具有神级活动的项目提供了预期的高风险值。您可以为几何活动风险生成一个相关模型,并执行与算术模型相同的风险分析。
Geometric activity risk is the last resort when trying to calculate the risk of a project with god activities. Such a project in effect has very high risk since most of the effort is spent on the critical god activities. As explained previously, due to the size of the god activities, the other activities have considerable float, which in turn skews the arithmetic risk lower, giving you a false sense of safety. In contrast, the geometric activity risk model provides the expected high risk value for projects with god activities. You can produce a correlation model for the geometric activity risk and perform the same risk analysis as with the arithmetic model.
图 12-6显示了第 11 章中提出的示例项目的几何活动风险及其相关性模型。
Figure 12-6 shows the geometric activity risk and its correlation model for the example project presented in Chapter 11.
图片 12-6几何活动风险模型
Figure 12-6 Geometric activity risk model
算术和几何模型的最大风险点均为 8.3 个月。几何活动模型(二阶导数为零)的最小减压目标为 10.94 个月,与算术模型的 10.62 个月相似,位于 的右侧D3。几何风险交叉点分别为 9.44 个月和 12.25 个月 — 范围略窄于使用算术活动风险模型时获得的 9.03 个月和 12.31 个月。如您所见,尽管风险曲线的行为非常不同,但这两个模型的结果大致相似。
The point of maximum risk, 8.3 months, is shared by both the arithmetic and geometric models. The minimum decompression target for the geometric activity model (where the second derivative is zero) comes at 10.94 months, similar to the 10.62 months of the arithmetic model and just to the right of D3. The geometric risk crossover points are 9.44 months and 12.25 months—a slightly narrower range than the 9.03 months and 12.31 months obtained when using the arithmetic activity risk model. As you can see, the results are largely similar for the two models, even though the behavior of the risk curve is very different.
当然,您不应该想办法计算具有上帝活动的项目的风险,而应该像前面讨论的那样修复上帝活动。但是,几何风险允许您按事物的现状而不是应有的方式处理事物。
Of course, instead of finding a way to calculate the risk of a project with god activities, you should fix the god activities as discussed previously. Geometric risk, however, allows you to deal with things the way they are, not the way they should be.
在前面的章节中,项目设计的讨论重点在于在工作开始之前做出明智的决定。只有通过量化持续时间、成本和风险,你才能决定项目是否负担得起且可行。然而,两个项目设计方案在持续时间、成本和风险方面可能相似,但在执行复杂性方面却有很大差异。此处的执行复杂性是指项目网络的复杂程度和挑战性。
In the previous chapters, the discussion of project design focused on driving educated decisions before work starts. Only by quantifying the duration, cost, and risk can you decide if the project is affordable and feasible. However, two project design options could be similar in their duration, cost, and risk, but differ greatly in their execution complexity. Execution complexity in this context refers to how convoluted and challenging the project network is.
圈复杂度用于测量连接复杂度。它可用于测量任何可以表示为网络的事物的复杂性,包括代码和项目。
Cyclomatic complexity measures connectivity complexity. It is useful in measuring the complexity of anything that you can express as a network, including code and the project.
圈复杂度公式为:
The cyclomatic complexity formula is:
对于项目执行的复杂性:
For project execution complexity:
E是项目中依赖项的数量。
E is the number of dependencies in the project.
N是项目中的活动数。
N is the number of activities in the project.
P是项目中断开的网络数量。
P is the number of disconnected networks in the project.
在设计良好的项目中,P该值始终为 1,因为您的项目应该只有一个网络。多个网络 ( P> 1) 会使项目更加复杂。
In a well-designed project, P is always 1 because you should have a single network for your project. Multiple networks (P > 1) make the project more complex.
为了演示循环复杂度公式,给定表 12-1中的网络,E等于 6,N为 5,P为 1。循环复杂度为 3:
To demonstrate the cyclomatic complexity formula, given the network in Table 12-1, E equals 6, N is 5, and P is 1. The cyclomatic complexity is 3:
表 12-1循环复杂度为 3 的示例网络
Table 12-1 Example network with cyclomatic complexity of 3
ID ID |
活动 Activity |
取决于 Depends On |
|---|---|---|
1 1 |
一个 A |
|
2 2 |
乙 B |
|
3 3 |
碳 C |
1,2 1,2 |
4 4 |
德 D |
1,2 1,2 |
5 5 |
埃 E |
3,4 3,4 |
虽然没有直接的方法来衡量项目的执行复杂度,但你可以使用圈复杂度公式作为其代理。项目的内部依赖关系越多,执行起来就越危险,也越有挑战性。任何这些依赖关系都可能被延迟,从而导致项目中其他多个地方的连锁延迟。具有N活动的项目的最大圈复杂度约为,即项目中的每一项活动都依赖于所有其他活动。N2
While there is no direct way to measure the execution complexity of the project, you can use the cyclomatic complexity formula as its proxy. The more internal dependencies the project has, the riskier and more challenging it is to execute. Any of these dependencies can be delayed, causing cascading delays in multiple other places in the project. The maximum cyclomatic complexity of a project with N activities is on the order of N2, a project where every one of the activities depends on all the other activities.
一般来说,项目的并行程度越高,其执行复杂度就越高。至少,要让更多的员工及时参与所有并行活动是一项挑战。并行工作(以及实现并行工作所需的额外工作)既增加了工作量,也增加了团队规模。团队规模越大,效率就越低,管理难度也就越大。并行工作还会导致更高的圈复杂度,因为并行工作的增加速度E比它增加的速度要快N。在极端情况下,一个项目的N活动同时开始并一起结束,其中每个活动都独立于所有其他活动,并且所有活动都是并行进行的,则圈复杂度为N+2。这样的项目具有巨大的执行风险。
In general, the more parallel the project, the higher its execution complexity will be. At the very least, it is challenging to have a larger staff available in time for all the parallel activities. The parallel work (and the additional work required to enable the parallel work) increases both the workload and the team size. A larger team will be less efficient and more demanding to manage. Parallel work also results in higher cyclomatic complexity because the parallel work increases E faster than it increases N. At the extreme, a project with N activities starting at the same time and finishing together, where each activity is independent of all other activities and the activities are all done in parallel, has a cyclomatic complexity of N + 2. Such a project has a huge execution risk.
同样,项目越是连续,执行起来就越简单,越不复杂。在极端情况下,最简单的项目N是一系列连续的活动。这样的项目具有最小可能的圈复杂度,恰好为 1。资源很少的次关键项目往往类似于这种长串的活动。虽然这种次关键项目的设计风险很高(接近 1.0),但执行风险非常低。
In much the same way, the more sequential the project, the simpler and less complex it will be to execute. At the extreme, the simplest project with N activities is a serial string of activities. Such a project has the minimum possible cyclomatic complexity of exactly 1. Subcritical projects with very few resources tend to resemble such long strings of activities. While the design risk of such a subcritical project is high (approaching 1.0), the execution risk is very low.
从经验上看,我发现设计良好的项目的循环复杂度为 10 或 12。虽然这个水平看起来很低,但你必须明白,履行承诺的机会与执行复杂性不成比例。例如,循环复杂度为 15 的项目可能只比循环复杂度为 12 的项目复杂 25%,但复杂度较低的项目成功的可能性可能是后者的两倍。因此,高执行复杂性与失败的可能性呈正相关。项目越复杂,你越有可能无法履行承诺。此外,成功完成一个复杂项目并不能保证你能够在另一个复杂项目中重复这一成功。
Empirically, I find that well-designed projects have a cyclomatic complexity of 10 or 12. While this level may seem low, you must understand that the chance of meeting your commitments is disproportionally related to the execution complexity. For example, a project with cyclomatic complexity of 15 may be only 25% more complex than a project with cyclomatic complexity of 12, but the lower-complexity project may be twice as likely to succeed. High execution complexity is therefore positively correlated to the likelihood of the failure. The more complex the project, the more likely you are to fail to meet your commitments. In addition, successfully delivering on one complex project is no guarantee that you will be able to repeat that success with another complex project.
当然,重复交付具有高圈复杂度级别的项目是可能的,但在整个组织中建立这样的能力需要时间。它需要健全的架构、符合风险准则的出色项目设计、团队其成员习惯于一起工作,并且生产力处于最高水平,并且拥有一流的项目经理,他非常注重细节并积极处理冲突。如果缺少这些要素,您应该采取积极措施,使用本章后面描述的分层设计和网络网络技术来降低执行复杂性。
It is certainly possible to repeatedly deliver projects with high cyclomatic complexity levels, but it takes time to build such capabilities across the organization. It requires a sound architecture, great project design within the risk guidelines, a team whose members are used to working together and are at peak productivity, and a top-notch project manager who pays meticulous attention to details and proactively handles conflicts. Lacking these ingredients, you should take active steps to reduce the execution complexity using the design-by-layers and network-of-networks techniques described later on in this chapter.
复杂性往往会随着压缩而增加,并且很可能以非线性方式增加。理想情况下,项目设计解决方案的复杂性与其持续时间的关系将如图12-7中的虚线曲线所示。
Complexity tends to increase with compression and is likely to do so in a nonlinear manner. Ideally, the complexity of your project design solutions as a function of their durations will look like the dashed curve of Figure 12-7.
图 12-7项目时间复杂度曲线
Figure 12-7 Project time–complexity curve
这种经典非线性行为的问题在于,它没有考虑到在不改变项目网络的情况下,使用更多技术资源来压缩项目。虚线还假设随着时间分配的不断增加,复杂性可以进一步降低,但如前所述,复杂性的最低值为 1。项目复杂性的更好模型是某种逻辑函数(图 12-7中的实线)。
The problem with such a classic nonlinear behavior is it does not account for compressing the project by using more skilled resources without any change to the project network. The dashed line also presumes that complexity can be further reduced with ever-increasing allocation of time, but, as previously stated, complexity has a hard minimum at 1. A better model of the project complexity is some kind of a logistic function (the solid line in Figure 12-7).
逻辑函数中相对平坦的区域代表使用更好资源的情况。曲线左侧的急剧上升对应于并行工作和压缩项目。曲线右侧的急剧下降代表项目的次临界解决方案(也需要相当多的时间)。图 12-8通过绘制第 11 章中示例项目的复杂性曲线来演示这种行为。
The relatively flat area of the logistic function represents the case of working with better resources. The sharp rise on the left of the curve corresponds to parallel work and compressing the project. The sharp drop on the right of the curve represents the project’s subcritical solutions (which also take considerably more time). Figure 12-8 demonstrates this behavior by plotting the complexity curve of the example project from Chapter 11.
图 12-8示例项目时间复杂度曲线
Figure 12-8 The example project time–complexity curve
回想一下第 11 章,即使是最压缩的解决方案也不会比普通解决方案贵很多。复杂度分析表明,在这种情况下,最大压缩的真实成本是圈复杂度增加了 25%——这表明项目执行更具挑战性和风险性。
Recall from Chapter 11 that even the most compressed solution was not materially more expensive than the normal solution. Complexity analysis reveals that the true cost of maximum compression in this case was a 25% increase in cyclomatic complexity—an indicator that the project execution is far more challenging and risky.
本书中的项目设计方法无论规模大小都很有效。然而,随着项目规模的扩大,它确实变得更具挑战性。人类大脑对项目细节、约束和相互依赖关系保持心理印象的能力是有限的。在某个项目规模下,你将失去设计项目的能力。大多数人可以设计一个包含多达 100 项活动的项目。通过练习,这个数字可以增加。一个设计良好的系统和项目甚至可以处理几百项活动。
The project design methodology in this book works well regardless of scale. It does, however, become more challenging as the project gets bigger. There is a maximum capacity of the human brain to maintain a mental picture of the details, constraints, and interdependencies within the project. At some project size, you will lose your ability to design the project. Most people can design a project that has up to 100 activities or so. With practice, this number can increase. A well-designed system and project make it possible to handle even a few hundreds of activities.
大型项目涉及数百甚至数千项活动,其复杂程度各不相同。它们通常涉及多个地点、数十或数百人、巨额预算和紧张的时间表。事实上,你通常会看到后三者同时出现,因为公司首先承诺制定一个紧张的时间表,然后向项目投入人力和资金,希望时间表能够顺利完成。
Megaprojects with many hundreds or even thousands of activities have their own level of complexity. They typically involve multiple sites, dozens or hundreds of people, huge budgets, and aggressive schedules. In fact, you typically see the last three in tandem because the company first commits to an aggressive schedule and then throws people and money at the project, hoping the schedule will yield.
项目越大,设计就越有挑战性,设计项目就越有必要。首先,项目越大,设计就越有挑战性。如果失败,你也要承担风险。其次,更重要的是,你必须计划同时开展工作,因为没有人会等待 500 年(甚至 5 年)才能交付。更糟糕的是,对于大型项目,从第一天开始就会有压力,因为这样的项目关系到公司的未来,许多人的职业生涯都岌岌可危。你会成为众人瞩目的焦点,经理们像愤怒的黄夹克一样蜂拥而至。
The larger the project becomes, the more challenging the design and the more imperative it is to design the project. First, the larger the project, the more is at stake if it fails. Second, and even more importantly, you have to plan to work in parallel out of the gate because no one will wait 500 years—or, for that matter, even 5 years—for delivery. Making things worse, with a megaproject the heat will be on from the very first day, because such projects place the future of the company at stake, and many careers are on the line. You will be under the spotlight with managers swarming around like angry yellow jackets.
几乎无一例外,所有大型项目最终都会以失败告终。规模直接影响着糟糕的结果。1项目越大,偏离承诺的程度就越大,延误的时间就越长,相对于最初的计划和预算,产生的成本也越来越高。大型项目是圣经中现代失败的金字塔。
Almost without exception, all megaprojects end up as megafailures. Size maps directly to poor outcomes.1 The larger the project, the larger the deviation will be from its commitments, with longer delays and higher and higher costs incurred relative to the initial schedule and budget. Megaprojects are modern-day failed ziggurats on a biblical scale.
1.纳西姆·尼古拉斯·塔勒布,《反脆弱》(兰登书屋,2012 年)。
1. Nassim Nicholas Taleb, Antifragile (Random House, 2012).
大型项目注定失败并非偶然,而是其复杂性的直接结果。在这种情况下,区分复杂和复杂非常重要。大多数软件系统是复杂的,而不是复杂的。复杂系统仍然可以具有确定性行为,您可以准确了解其内部工作原理。这样的系统将对设定的输入具有已知的可重复响应,其过去的行为可以预示其未来的行为。与复杂系统相比,天气、经济和您的身体都是复杂系统。复杂系统的特点是缺乏对内部机制的理解和无法预测行为。这种复杂行为不一定是由于众多复杂的内部部件造成的。例如,三个相互绕转的物体是一个复杂的非确定性系统。即使是带有枢轴的简单钟摆也是一个复杂系统。虽然这两个例子都不复杂,但它们仍然是复杂系统。
The fact that large projects are ordained to fail is not an accident, but rather a direct result of their complexity. In this context, it is important to distinguish between complex and complicated. Most software systems are complicated, not complex. A complicated system can still have deterministic behavior, and you can understand its inner workings exactly. Such a system will have known repeatable responses to set inputs, and its past behavior is indicative of its future behavior. In contrast to a complicated system, the weather, the economy, and your body are complex systems. Complex systems are characterized by lack of understanding of the internal mechanism at play and inability to predict behavior. This complex behavior is not necessarily due to numerous complicated internal parts. For example, three bodies orbiting one another are a complex nondeterministic system. Even a simple pendulum with a pivot is a complex system. While both of these examples are not complicated, they are still complex systems.
过去,复杂软件系统仅限于任务关键型系统,其底层领域本身就很复杂。在过去二十年中,由于系统连接性、多样性和云计算规模的增加,企业系统甚至普通软件系统现在都表现出复杂系统的特征。
In the past, complex software systems were limited to mission-critical systems, where the underlying domain was inherently complex. Over the past two decades, due to increased systems connectivity, diversity, and the scale of cloud computing, enterprise systems and even just regular software systems now exhibit complex system traits.
复杂系统的一个基本属性是它们以非线性方式响应条件的微小变化。这就是“最后一片雪花效应”,其中一片额外的雪花就可能在积雪覆盖的山坡上引发雪崩。
A fundamental attribute of complex systems is that they respond in nonlinear ways to minute changes in the conditions. This is the last-snowflake effect, in which a single additional flake can cause an avalanche on a snow-laden mountain side.
单个雪花之所以如此危险,是因为复杂性会随着规模的扩大而呈非线性增长。在大型系统中,复杂性的增加会导致失败风险相应增加。风险函数本身可以是复杂性的高度非线性函数,类似于幂律函数。即使函数的底数接近 1,并且系统规模增长缓慢(每次增加一行代码或山坡上增加一片雪花),随着时间的推移,复杂性的增长及其对风险的复合效应将导致因失控反应而失败。
The single snowflake is so risky because complexity grows nonlinearly with size. In large systems, the increase in complexity causes a commensurate increase in the risk of failure. The risk function itself can be a highly nonlinear function of complexity, akin to a power law function. Even if the base of the function is almost 1, and the system grows slowly in size (one additional line of code at a time or one more snowflake on the mountain side), over time the growth in complexity and its compounding effect on risk will cause a failure due to a runaway reaction.
复杂性理论2致力于解释复杂系统的行为方式。根据复杂性理论,所有复杂系统都具有四个关键要素:连通性、多样性、相互作用和反馈回路。任何非线性故障行为都是这些复杂性驱动因素的产物。
Complexity theory2 strives to explain why complex systems behave as they do. According to complexity theory, all complex systems share four key elements: connectivity, diversity, interactions, and feedback loops. Any nonlinear failure behavior is the product of these complexity drivers.
2. https://en.wikipedia.org/wiki/Complex_system
2. https://en.wikipedia.org/wiki/Complex_system
即使系统很大,如果各部分之间没有连接,复杂性也不会上升。在由n多个部分组成的连接系统中,连接复杂性与成正比增长(这种关系称为梅特卡夫定律3)。您甚至可以将连接复杂性归结为连锁效应,即任何单一变化都会导致变化,而每个变化都会导致其他变化,依此类推。n2nnnn
Even if the system is large, if the parts are disconnected, complexity will not raise its head. In a connected system with n parts, connectivity complexity grows in proportion to n2 (a relationship known as Metcalfe’s law3). You could even make the case for connectivity complexity on the order of nn due to ripple effects, where any single change causes n changes and each of those causes n additional changes, and so on.
3. https://en.wikipedia.org/wiki/Metcalfe's_law
3. https://en.wikipedia.org/wiki/Metcalfe's_law
如果各个部件是克隆版或彼此的简单变体,系统仍然可以具有连接部件,并且管理和控制起来不会那么复杂。另一方面,系统越多样化(例如拥有不同的团队,他们使用自己的工具、编码标准或设计),系统就越复杂,也越容易出错。例如,假设有一家航空公司使用 20 种不同类型的飞机,每种飞机都针对特定的市场,具有独特的部件、机油、飞行员和维护计划。这个非常复杂的系统注定会因为多样性而失败。相比之下,有一家航空公司只使用一种通用类型的飞机,这种飞机不是为任何特定市场设计的,可以服务于所有市场、乘客和航程。这家航空公司不仅运营起来更简单,而且更加健壮,可以更快地响应市场的变化。这些想法应该与第 4 章讨论的可组合设计的优势产生共鸣。
The system can still have connected parts and not be that complex to manage and control if the parts are clones or simple variations of one another. On the other hand, the more diverse the system is (such as having different teams with their own tools, coding standards, or design), the more complex and error prone that system will be. For example, consider an airline that uses 20 different types of airplanes, each specific for its own market, with unique parts, oils, pilots, and maintenance schedules. This very complex system is bound to fail simply because of diversity. Compare that with an airline that uses just a single generic type of airplane that is not designed for any market in particular and can serve all markets, passengers, and ranges. This second airline is not just simpler to run: It is more robust and can respond much more quickly to changes in the marketplace. These ideas should resonate with the advantages of composable design discussed in Chapter 4.
只要不允许各部分之间有激烈的互动,你甚至可以控制和管理一个互联的多元化系统。这种互动可能会给整个系统带来意想不到的不稳定后果,通常涉及进度、成本、质量、执行、性能、可靠性、现金流等各个方面。客户满意度、留存率和士气。如果不加以抑制,这些变化将以反馈循环的形式触发更多互动。这种反馈循环将问题放大到过去不存在问题的输入或状态条件变得能够使系统崩溃的程度。
You can even control and manage a connected diverse system as long as you do not allow intense interactions between the parts. Such interactions can have destabilizing unintended consequences across the system, often involving diverse aspects such as schedule, cost, quality, execution, performance, reliability, cash flow, customer satisfaction, retention, and morale. Unabated, these changes will trigger more interactions in the form of feedback loops. Such feedback loops magnify the problems to the point that input or state conditions that were not an issue in the past become capable of taking the system down.
大型项目失败的另一个原因与质量有关。当一个复杂的系统依赖于一系列任务的完成(例如项目中服务或活动之间的一系列交互)时,当任何任务的失败都会导致整体失败时,任何质量问题都会产生严重的副作用,即使组件非常简单。1986 年,一个 30 美分的 O 形圈导致价值 30 亿美元的航天飞机坠毁,就证明了这一点。
The other reason large projects fail has to do with quality. When a complex system depends on the completion of a series of tasks (such as a series of interactions between services or activities in a project), and when the failure of any task causes failure of the whole, any quality issue produces severe side effects, even if the components are very simple. This was demonstrated in 1986 when a 30-cent O-ring brought down a $3 billion space shuttle.
当整体质量取决于所有组成部分的质量时,整体质量就是各个元素质量的乘积。4结果是高度非线性的衰减行为。例如,假设系统执行由 10 个较小任务组成的复杂任务,每个任务的质量接近完美,为 99%。在这种情况下,总体质量只有 90%(0.99 10 = 0.904)。
When the quality of the whole depends on the quality of all the components, the overall quality is the product of the qualities of the individual elements.4 The result is highly nonlinear decay behavior. For example, suppose the system performs a complex task composed of 10 smaller tasks, each having a near-perfect quality of 99%. In that case the aggregate quality is only 90% (0.9910 = 0.904).
4. Michael Kremer,《经济发展的O形环理论》,《季刊经济学》 108卷,第3期(1993年):551-575页。
4. Michael Kremer, “The O-Ring Theory of Economic Development,” Quarterly Journal of Economics 108, no. 3 (1993): 551-575.
即使这种 99% 质量或可靠性的假设也是不现实的,因为大多数软件单元从未在所有可能输入、与所有连接组件的所有可能交互、所有可能的状态变化反馈循环、所有部署和客户环境等方面的 99% 范围内进行测试。实际的单元质量数字可能更低。如果每个单元都经过测试并在 90% 的水平内合格,系统质量就会下降到 35%。每个组件的质量下降 10% 会导致整体结果降低 65%。
Even this assumption of 99% quality or reliability is unrealistic because most software units are never tested to within 99% of all possible inputs, all possible interactions with all connecting components, all possible feedback loops of state changes, all deployments and customer environments, and so on. The realistic unit quality figures are probably lower. If each unit was tested and qualified within a 90% level, the system quality drops to 35%. A 10% decrease in quality per component degrades the overall outcome by 65%.
系统的组件越多,效果就越差,系统就越容易出现质量问题。这解释了为什么大型项目经常会因为质量低劣而无法使用。
The more components the system has, the worse the effect becomes, and the more vulnerable the system becomes to any quality issues. This explains why large projects often suffer from poor quality to the point of being unusable.
大型项目成功的关键在于通过缩小项目规模来消除复杂性的驱动因素。您必须将项目视为网络的网络。与其只做一个非常大的项目,不如创建几个更小、更简单的项目,这些项目的成功几率要大得多。成本通常会至少增加一点,但失败的可能性会大大降低。
The key to success in large projects is to negate the drivers of complexity by reducing the size of the project. You must approach the project as a network of networks. Instead of one very large project, you create several smaller, less complex projects that are far more likely to succeed. The cost will typically increase at least by a little, but the likelihood of failure will decrease considerably.
对于网络网络,有一个前提条件,即该项目是可行的,即有可能以这种方式构建该项目。如果该项目是可行的,那么网络很可能不会紧密耦合,并且可以划分为单独的子网络。否则,该项目注定会失败。
With a network of networks, there is a proviso that the project is feasible, that it is somehow possible to build the project in this way. If the project is feasible, then there is a good probability that the networks are not tightly coupled and that the segmentation into separate subnetworks is possible. Otherwise, the project is destined to fail.
一旦您拥有了网络网络,您就可以像任何其他项目一样设计、管理和执行每个网络。
Once you have the network of networks, you design, manage, and execute each of them just like any other project.
由于您事先不知道分割是否可行,也不知道网络的网络是什么样子,因此您必须参与一个初步的小型项目,其任务是发现网络的网络。设计网络的网络从来都不是只有一种方法;事实上,通常有多种形状和结构的可能性。这些可能性几乎从来都不相同,因为其中一些可能比其他的更容易处理。您必须比较和对比各种选项。
Since you do not know in advance if the segmentation is possible or what the network of networks looks like, you must engage in a preliminary mini-project whose mission is to discover the network of networks. There is never just one way of designing the network of networks; indeed, there are usually multiple possibilities for shape and structure. These possibilities are hardly ever equivalent because a few of them are likely to be easier to deal with than others. You must compare and contrast the various options.
与所有设计工作一样,设计网络网络的方法应该是迭代的。首先设计大型项目,然后沿着大型关键路径将其分解成单个可管理的项目。寻找网络相互作用的交汇点。这些交汇点是开始分割的好地方。不仅要寻找依赖关系的交汇点,还要寻找时间的交汇点:如果一整套活动在另一套活动开始之前完成,那么就有一个时间交汇点,即使依赖关系都是交织在一起的。更多一个先进的技术是寻找一种分割方法,使网络的总循环复杂度最小化。在这种情况下,P总复杂度大于 1 是可以接受的,而每个子网络的复杂度为P1。
As with all design efforts, your approach to designing the network of networks should be iterative. Start by designing the megaproject, and then chop it into individual manageable projects along and beside the mega critical path. Look for junctions where the networks interact. These junctions are a great place to start with the segmentation. Look for junctions not just of dependencies but also of time: If an entire set of activities would complete before another set could start, then there is a time junction, even if the dependencies are all intertwined. A more advanced technique is to look for a segmentation that minimizes the total cyclomatic complexity of the network of networks. In this case, P greater than 1 is acceptable for the total complexity, while each subnetwork has P of 1.
图 12-9显示了一个大型项目的示例,图 12-10显示了由此产生的三个独立子网络。
Figure 12-9 shows an example megaproject, and Figure 12-10 shows the resulting three independent subnetworks.
图 12-9大型项目示例
Figure 12-9 An example megaproject
图 12-10生成的网络
Figure 12-10 The resulting network of networks
通常,最初的大型项目对于此类工作来说太过混乱。在这种情况下,花时间简化或改进大型项目的设计将有助于您识别网络。通过引入规划假设并对大型项目施加约束来寻找降低复杂性的方法。强制某些阶段在其他阶段开始之前完成。消除伪装成需求的解决方案。
Quite often, the initial megaproject is just too messy for such work. When this is the case, investing the time to simplify or improve the design of the megaproject will help you identify the network of networks. Look for ways to reduce complexity by introducing planning assumptions and placing constraints on the megaproject. Force certain phases to complete before others start. Eliminate solutions masquerading as requirements.
图 12-9中的图表经过多次复杂度降低迭代才达到所示的状态。初始图表难以理解且不可行。
The diagram in Figure 12-9 underwent several complexity reduction iterations to reach the state shown. The initial diagram was incomprehensible and unworkable.
网络之间的网络可能会包含一些依赖关系,这些依赖关系会破坏分段或以某种方式阻止所有网络之间的并行工作,至少在最初阶段是这样。您可以通过投资以下网络解耦技术来解决这些问题:
The network of networks will likely include some dependencies that scuttle the segmentation or somehow prevent parallel work across all networks, at least initially. You can address these by investing in the following network-decoupling techniques:
架构和接口
Architecture and interfaces
模拟器
Simulators
开发标准
Development standards
构建、测试和部署自动化
Build, test, and deployment automation
质量保证(不仅仅是质量控制)
Quality assurance (not mere quality control)
虽然构建网络网络没有固定的公式,但最好的指导方针是发挥创造力。你经常会发现自己诉诸于创造性的解决方案来解决非技术问题,而这些问题会抑制分割。也许政治斗争和阻力会集中大型项目的各个部分,而不是分散它们。在这种情况下,你需要确定权力结构并化解局势,以便进行分割。也许涉及竞争的跨组织问题阻碍了网络之间的适当沟通和合作,表现为项目僵化的顺序流程。或者开发人员可能位于不同的地方,管理层坚持以功能性的方式为每个地点提供一些工作。这种分解与正确的网络网络或真正的技能所在无关。你可能需要提出大规模重组,包括重新安置人员的可能性,让组织反映网络网络,而不是相反(有关此主题的更多信息,请参阅下一节“反对康威定律”)。
Although there is no set formula for constructing the network of networks, the best guideline is to be creative. You will often find yourself resorting to creative solutions to nontechnical problems that stifle the segmentation. Perhaps political struggles and pushback concentrate parts of the megaproject instead of distributing them. In such cases, you need to identify the power structure and defuse the situation to allow for the segmentation. Perhaps cross-organizational concerns involving rivalries prevent proper communication and cooperation across the networks, manifesting as rigid sequential flow of the project. Or maybe the developers are in separate locations, and management insists on providing some work for each location, in a functional way. Such decomposition has nothing to do with the correct network of networks or where the real skills reside. You may need to propose a massive reorganization, including the possibility of relocating people, to have the organization reflect the network of networks, rather than the other way around (more on this topic in the next section on countering Conway’s law).
也许一些遗留团体因为个人恩惠而被强制参与该项目。这不仅没有造成分化,反而为项目带来了瓶颈,因为现在其他一切都围绕着遗留团队。一个解决方案可能是将遗留团队转变为跨网络领域专家测试工程师团队。
Perhaps some legacy group is mandated to be part of the project due to personal favors. Instead of segmentation, this creates a choke point for the project because everything else now revolves around the legacy group. One solution might be to convert the legacy group into a cross-network group of domain expert test engineers.
最后,让不同的人尝试对网络的网络进行几种渲染,原因很简单,有些人可能认为简单,而其他人则不这么认为。考虑到风险,你必须从各个角度进行。花时间仔细设计网络的网络。不要仓促行事。这将是特别具有挑战性的,因为其他人都急于开始工作。然而,由于项目的规模,如果没有这个关键的规划和构建阶段,某些失败是不可避免的。
Finally, try several renderings of the network of networks by different people, for the simple reason that some may see simplicity where others do not. Given what is at stake, you must pursue every angle. Take your time to carefully design the network of networks. Avoid rushing. This will be especially challenging since everyone else will be aching to start work. Due to the project’s size, however, certain failure lurks without this crucial planning and structuring phase.
1968 年,梅尔文·康威 (Melvin Conway) 提出了康威定律5 ,该定律指出,设计系统的组织总是会设计出与这些组织的通信结构相同的设计。根据康威的说法,一个集中式、自上而下的组织只能产生集中式、自上而下的架构,而永远不会产生分布式架构。同样,一个按照职能线构建的组织只会构思系统的功能分解。当然,在数字通信时代,康威定律并不普遍,但它很常见。
In 1968, Melvin Conway coined Conway’s law,5 which states that organizations that design systems always produce designs that are copies of the communication structures of these organizations. According to Conway, a centralized, top-down organization can produce only centralized, top-down architectures—never distributed architectures. Similarly, an organization structured along functional lines will conceive only functional decompositions of systems. Certainly, in the age of digital communication, Conway’s law is not universal, but it is common.
5. Melvin E. Conway,“委员会如何发明?”,Datamation,14,第5期(1968):28-31。
5. Melvin E. Conway, “How Do Committees Invent?,” Datamation, 14, no. 5 (1968): 28-31.
如果康威定律对你的成功构成威胁,那么应对它的一种好方法就是重组组织。为此,你首先要建立正确且适当的设计,然后在组织结构、报告结构和沟通渠道中反映该设计。不要回避在 SDP 审查中提出此类重组作为设计建议的一部分。
If Conway’s law poses a threat to your success, a good practical way to counter it is to restructure the organization. To do so, you first establish the correct and adequate design, and then you reflect that design in the organizational structure, the reporting structure, and the communication lines. Do not shy away from proposing such a reorganization as part of your design recommendations at the SDP review.
尽管康威定律最初指的是系统设计,但他的定律同样适用于项目设计和网络性质。如果您的项目设计包含一个网络网络,您可能必须在设计的同时对组织进行重组以模仿这些网络。即使在常规规模的项目中,您需要在多大程度上对抗康威定律,也取决于具体情况。如果您的观察(甚至您的直觉)告诉您有必要,请注意组织动态并设计正确的结构。
Although Conway referred originally to system design, his law applies equally well to project design and to the nature of the network. If your project design includes a network of networks, you may have to accompany your design with a restructuring of the organization that mimics those networks. The degree to which you will have to counter Conway’s law even in a regular-size project is case-specific. Be aware of the organizational dynamics and devise the correct structure if your observation (or even your intuition) is telling you it is necessary.
与大型项目相对的是小型(甚至非常小)项目。与直觉相反,精心设计这些小型项目非常重要。小型项目比大型项目更容易出现项目设计错误。是常规规模的项目。由于规模较大,它们对条件变化的反应更大。例如,考虑错误分配人员的影响。对于一个由 15 人组成的团队,这样的错误会影响大约 7% 的可用资源。对于一个由 5 人组成的团队,它会影响 20% 的项目资源。一个项目可能能够承受 7% 的错误,但 20% 的错误就是严重的麻烦。大型项目可能有资源缓冲来承受错误。对于小型项目,每个错误都至关重要。
On the other side of the scale from very large projects are small (or even very small) projects. Counterintuitively, it is important to carefully design such small projects. Small projects are even more susceptible to project design mistakes than are regular-size projects. Due to their size they respond much more to changes in their conditions. For example, consider the effects of assigning a person incorrectly. With a team of 15 people, such a mistake affects about 7% of the available resources. With a team of 5 people, it affects 20% of the project resources. A project may be able to survive a 7% mistake, but a 20% mistake is serious trouble. A large project may have the resource buffer to survive mistakes. With a small project, every mistake is critical.
从积极的一面来看,小型项目可能非常简单,不需要太多的项目设计。例如,如果您只有一种资源,项目网络就是一长串活动,其持续时间是所有活动持续时间的总和。只需进行极少的项目设计,您就可以知道持续时间和成本。也不需要构建时间成本曲线或计算风险(它将是 1.0)。由于大多数项目的网络形式都不同于一两个简单的字符串,并且由于您应该避免次要项目,因此从实际意义上讲,您几乎总是设计小型项目。
On the positive side, small projects may be so simple that they do not require much project design. For example, if you have only a single resource, the project network is a long string of activities whose duration is the sum of duration across all activities. With very minimal project design, you can know the duration and cost. There is also no need to build the time–cost curve or calculate the risk (it will be 1.0). Since most projects have some form of a network that differs from a simple string or two, and since you should avoid subcritical projects, in a practical sense you almost always design even small projects.
本书到目前为止的所有项目设计示例都是根据活动之间的逻辑依赖关系生成活动网络的。我将这种方法称为“依赖关系设计”。但是,还有另一种选择,即根据项目的架构层构建项目。使用方法的架构结构时,这是一个简单的过程。您可以首先构建实用程序,然后是资源和资源访问,然后是引擎、管理器和客户端,如图 12-11所示。我将这种技术称为“分层设计”。
All the project design examples so far in this book have produced their network of activities based on the logical dependencies between activities. I call this approach design by dependencies. There is, however, another option—namely, building the project according to its architecture layers. This is a straightforward process when using The Method’s architectural structure. You could first build the Utilities, then the Resources and the ResourceAccess, followed by Engines, Managers, and Clients, as shown in Figure 12-11. I call this technique design by layers.
图 12-11项目分层设计
Figure 12-11 Project design by layers
如图 12-11所示,网络图基本上是一系列脉冲,每个脉冲对应架构中的一层。虽然脉冲是连续的并且通常是串行的,但内部每个脉冲都是并行构建的。该方法遵循封闭架构原则,使脉冲内部的并行工作成为可能。
As shown in Figure 12-11, the network diagram is basically a series of pulses, each corresponding to a layer in the architecture. While the pulses are sequential and often serialized, internally each pulse is constructed in parallel. The Method’s adherence to the closed architecture principle enables this parallel work inside a pulse.
按层次设计时,进度安排与按依赖关系设计的同一项目类似。两种情况都会导致由跨层的架构组件组成的类似关键路径。
When designing by layers, the schedule is similar to the same project designed by dependencies. Both cases result in a similar critical path composed of the components of the architecture across the layers.
分层设计的缺点是风险增加。理论上,如果每层的所有服务持续时间相同,那么它们都是关键的,风险数字接近 1.0。即使情况并非如此,任何一层的完成延迟都会立即延迟整个项目,因为后续脉冲被搁置。然而,当按依赖关系进行设计时,只有关键活动才会面临延迟项目的风险。解决分层设计项目的高风险的最佳(也是几乎强制性的)方法是使用风险减压。由于几乎所有活动都是关键的或接近关键的,因此项目将对减压做出很好的反应,因为每个脉冲中的所有活动都会获得额外的浮动时间。为了进一步补偿分层设计的隐含风险,您应该对项目进行减压,使其风险小于 0.5,可能为 0.4。这种级别的减压表明按分层设计的项目将比按依赖关系设计的项目花费更长的时间。
A downside to designing by layers is the increase in risk. In theory, if all services in each layer are of equal duration, then they are all critical, and the risk number approaches 1.0. Even if that is not the case, any delay in the completion of any layer immediately delays the entire project because subsequent pulses are put on hold. When designing by dependencies, however, only the critical activities run such risk of delaying the project. The best (and nearly mandatory) way of addressing the high risk of a design-by-layers project is to use risk decompression. Because almost all activities will be critical or near critical, the project will respond very well to decompression, as all activities in each pulse gain the additional float. To further compensate for the implicit risk of design by layers, you should decompress the project so that its risk is less than 0.5, perhaps to 0.4. This level of decompression suggests that projects designed by layers will take longer than projects designed by dependencies.
按层设计会增加团队规模,进而增加项目的直接成本。通过按依赖关系设计,您可以通过用浮动资源换取资源来找到允许关键路径畅通无阻的最低资源级别。通过按层设计,您可能需要尽可能多的资源来完成当前层。团队必须并行完成每个脉冲内的所有活动,并在开始下一个脉冲之前完成所有活动。您必须假设当前层中的所有组件都是下一层所必需的。
Designing by layers can increase the team size and, in turn, the direct cost of the project. With design by dependencies, you find the lowest level of resources allowing unimpeded progress along the critical path by trading float for resources. With design by layers, you may need as many resources as are required to complete the current layer. The team has to work in parallel on all activities within each pulse and complete all of them before beginning the next pulse. You must assume all the components in the current layer are required by the next layer.
考虑到这一点,分层设计具有明显的优势,可以生成非常简单易执行的项目设计。它是复杂项目网络的最佳解决方案,可以将整体循环复杂度降低一半或更多。理论上,由于脉冲是连续的,因此在任何时候,项目经理只需应对每个脉冲和支持活动的执行复杂性。每个脉冲的循环复杂度大致与并行活动的数量相匹配。在典型的基于方法的系统中,这种循环复杂度低至 4 或 5,而按依赖关系设计的项目的循环复杂度可能为 50 或更多。
With that in mind, designing by layers has the clear advantage of producing a very simple project design to execute. It is the best antidote for a complex project network and can reduce overall cyclomatic complexity by half or more. In theory, since the pulses are sequential, at any moment in time the project manager has to contend with only the execution complexity of each pulse and the support activities. The cyclomatic complexity of each pulse roughly matches the number of parallel activities. In a typical Method-based system, this cyclomatic complexity is as low as 4 or 5, whereas the cyclomatic complexity of projects designed by dependencies can be 50 or more.
软件行业的许多项目都容忍进度延误和产能过剩;因此,它们真正的挑战是复杂性,而不是持续时间或成本。如果可能的话,对于基于方法的系统,我更喜欢分层设计,以解决否则风险大且复杂的执行问题。与项目设计的大多数事情一样,分层设计首先取决于拥有正确的架构。
Many projects in the software industry tolerate both schedule slips and overcapacity; therefore their real challenge is complexity, not duration or cost. When possible, with Method-based systems, I prefer designing by layers to address the otherwise risky and complex execution. As with most things when it comes to project design, designing by layers is predicated on having the right architecture in the first place.
您可以将按层设计和按依赖关系设计技术结合起来。例如,第 11 章中的示例项目将所有基础设施实用程序移至项目的开头,尽管它们的逻辑依赖关系允许它们在项目的后期进行。项目的其余部分是基于逻辑依赖关系设计的。
You can combine the techniques of both design by layers and design by dependencies. For example, the example project in Chapter 11 moved all of the infrastructure Utilities to the beginning of the project, despite the fact that their logical dependencies would have allowed them to take place much later in the project. The rest of the project was designed based on logical dependencies.
分层设计和构建是第 4 章中设计规则的完美示例:功能始终是集成的方面,而不是实现。只有在所有层都完成后,才能将它们集成到功能中。这意味着分层设计非常适合常规项目,而不是具有多个独立子系统的更大更复杂的项目。回到房屋类比,对于简单的房屋,其建造总是分层的——通常是地基、管道、墙壁、屋顶等。对于大型多层建筑,每一层都是一个独立的项目,包含管道、墙壁、天花板和其他任务。
Designing and building by layers is a perfect example of the design rule from Chapter 4: Features are always and everywhere aspects of integration, not implementation. Only after all the layers are complete can you integrate them into features. This implies that designing by layers is well suited to regular projects rather than larger and more complex projects with multiple independent subsystems. To return to the house analogy, with a simple house the construction is always by layers—typically the foundation, plumbing, walls, roof, and so on. With a large multistory building, each floor is its own separate project that contains plumbing, walls, ceiling, and other tasks.
最后一点是,分层设计项目基本上就是将项目分解成更小的子项目。这些小项目按顺序完成,并按时间节点分开。这类似于将大型项目分解成更小的网络,具有非常相似的好处。
A final observation is that designing the project by layers basically breaks the project into smaller subprojects. These smaller projects are done sequentially and are separated by junctions of time. This is akin to breaking a megaproject into smaller networks and carries very similar benefits.
虽然第 11 章说明了一个示例项目,但其主要目的是教授使用项目设计技术时的思维过程以及它们是如何相互关联的。其次,该示例展示了端到端的项目设计。本章的重点是如何在实际项目中推动项目设计决策,以及何时应用哪种项目设计技术。此处设计的项目构建了 TradeMe 系统,即第 5 章中的示例系统。与第 5 章中的系统设计案例研究一样,本章直接源自 IDesign 为其一位客户设计的实际项目。设计团队由两名 IDesign 架构师(一名老手和一名学徒)和一名来自客户的项目经理组成。虽然这个例子清理或模糊了具体的业务细节,但我在这里展示的是原样的项目设计。系统和项目设计工作都在不到一周的时间内完成。
While Chapter 11 illustrated an example project, its main purpose was to teach the thought process when using project design techniques and how they interrelate. Only secondarily did the example demonstrate end-to-end project design. The focus in this chapter is how to drive project design decisions in a real-life project and when to apply which project design techniques. The project designed here builds the TradeMe system, the example system from Chapter 5. As with the system design case study in Chapter 5, this chapter derives directly from the actual project that IDesign designed for one of its customers. The design team consisted of two IDesign architects (a veteran and an apprentice) and a project manager from the customer. While this example scrubs or obfuscates the specific business details, I present here the project design as it was. Both the system and the project design effort were completed in less than a week.
本章中使用的所有数据和计算都可以作为本书的可下载参考文件的一部分。但是,当第一次阅读本章时,我建议您抵制在文本和文件之间不断交叉检查的诱惑。相反,您应该专注于导致这些计算的推理和结果的解释。一旦掌握了这些,您就可以在详细探索数据以确认您的理解并练习这些技术时使用本章作为参考。
All of the data and calculations used in this chapter are available as part of the downloadable reference files for this book. However, when reading this chapter for the first time, I advise you to resist the temptation to crosscheck constantly between the text and the files. Instead, you should focus on the reasoning leading to those calculations and the interpretations of the results. Once those are in hand, you can use this chapter for reference as you explore the data in detail to confirm your understanding and to practice the techniques.
TradeMe 项目设计工作进行了两种类型的估算:单个活动估算和整体项目估算。单个活动估算用于项目设计解决方案,整体估算用于验证项目设计结果。
The TradeMe project design effort performed two types of estimations: individual activity estimations and an overall project estimation. The individual activity estimations were used in the project design solutions, and the overall estimation served to validate the project design results.
通过列出项目中的活动类型来估算各个活动,以避免遗漏关键活动。该团队将 TradeMe 活动分为三类:
Estimating the individual activities started by listing the types of activities in the project to avoid missing crucial activities. The team classified the TradeMe activities into three categories:
结构编码活动
Structural coding activities
非结构化编码活动
Nonstructural coding activities
非编码活动
Noncoding activities
在制定活动清单时,设计团队对每个清单进行了扩展,包括单个活动和每个活动的预计持续时间。团队还根据客户的流程或自己的经验,指定了负责每个活动的指定角色。
When building the list of activities, the design team expanded each list to include the individual activities and the duration estimation for each activity. The team also indicated the designated role responsible for each activity according to the customer’s process or their own experience.
设计团队清楚地记录了估算中的任何初始约束或假设。TradeMe 项目依赖于以下估算假设:
The design team clearly documented any initial constraints or assumptions on the estimations. The TradeMe project relied on the following estimation assumptions:
详细设计。各个开发人员都能够进行详细设计,因此每个编码活动都包含自己的详细设计阶段。
Detailed design. The individual developers were capable of doing the detailed design, so each coding activity contained its own detailed design phase.
开发过程。团队准备快速、干净地构建系统,同时依赖本书中的大部分最佳实践。
Development process. The team was set to build the system quickly and cleanly, while relying on most of the best practices in this book.
TradeMe 的结构活动直接源自系统架构(见图5-14)。这些活动包括实用程序、资源、资源访问、管理器、引擎和客户端,大部分是开发人员的任务。架构师负责Message Bus和的关键活动Workflow Repository。表 13-1列出了该项目部分结构编码活动的持续时间估计。
The structural activities of TradeMe derived directly from the system architecture (see Figure 5-14). These activities included Utilities, Resources, ResourceAccess, Managers, Engines, and Clients, and were mostly tasks for developers. The architect was responsible for the key activities of the Message Bus and the Workflow Repository. Table 13-1 lists the duration estimation for some of the structural coding activities of the project.
表 13-1部分结构编码活动的持续时间估计
Table 13-1 Duration estimation for some of the structural coding activities
ID ID |
活动 Activity |
持续时间(天) Duration (days) |
角色 Role |
|---|---|---|---|
14 14 |
日志记录 Logging |
10 10 |
开发人员 Developer |
15 15 |
消息总线 Message Bus |
15 15 |
建筑师 Architect |
16 16 |
安全 Security |
20 20 |
开发人员 Developer |
18 18 |
付款数据库 Payments DB |
5 5 |
DB 建筑师 DB Architect |
… … |
… … |
… … |
… … |
23 23 |
工作流存储库 Workflow Repository |
15 15 |
建筑师 Architect |
… … |
… … |
… … |
… … |
二十六 26 |
付款访问 Payments Access |
10 10 |
开发人员 Developer |
… … |
… … |
… … |
… … |
三十五 35 |
搜索引擎 Search Engine |
15 15 |
开发人员 Developer |
… … |
… … |
… … |
… … |
三十八 38 |
市场经理 Market Manager |
10 10 |
开发人员 Developer |
… … |
… … |
… … |
… … |
四十五 45 |
市场应用程序 Marketplace App |
二十五 25 |
开发人员 Developer |
TradeMe 设计团队确定了一些不直接映射到架构的编码活动。这些活动是系统操作概念和公司开发过程的结果。表 13-2列出了团队对项目非结构编码活动的持续时间估计。
The TradeMe design team identified a few coding activities that did not map directly to the architecture. These activities were the result of both the system operational concept and the company’s development process. Table 13-2 lists the team’s duration estimation for the non-structural coding activities of the project.
表 13-2非结构化编码活动的持续时间估计
Table 13-2 Duration estimation for nonstructural coding activities
ID ID |
活动 Activity |
持续时间(天) Duration (days) |
角色 Role |
|---|---|---|---|
10 10 |
系统测试工具 System Test Harness |
二十五 25 |
测试工程师 Test Engineer |
三十六 36 |
摘要管理器 Abstract Manager |
三十 30 |
开发人员 Developer |
40 40 |
回归测试工具 Regression Test Harness |
10 10 |
开发人员 Developer |
抽象管理器是系统中其余管理器的基础服务。它包含大部分工作流管理以及消息总线交互。派生管理器执行特定工作流。其他两个活动都与测试有关。System Test Harness由测试工程师负责,但Regression Test Harness由开发人员负责。
The abstract Manager was a base service for the rest of the Managers in the system. It contained the bulk of the workflow management as well as the message bus interaction. Derived Managers executed specific workflows. The other two activities were both testing-related. The System Test Harness was owned by a test engineer, but the Regression Test Harness was owned by a developer.
TradeMe 有许多非编码活动,这些活动往往集中在项目的开始或结束时。非编码活动由核心团队的各个成员、测试工程师、测试人员或 UX 设计师等外部专家负责。这些活动如表 13-3所示。此列表还受到公司开发流程、规划假设和对质量的承诺的推动。
TradeMe had many noncoding activities, which tended to concentrate at the beginning or the end of the project. The noncoding activities were owned by various members of the core team, the test engineer, testers, or external experts such as a UX designer. These activities are shown in Table 13-3. This list was also driven by the company’s development process, planning assumptions, and commitment to quality.
表 13-3非编码活动的持续时间估计
Table 13-3 Duration estimation for the noncoding activities
ID ID |
活动 Activity |
持续时间(天) Duration (days) |
角色 Role |
|---|---|---|---|
2 2 |
要求 Requirements |
15 15 |
建筑师、产品经理 Architect, Product Manager |
3 3 |
建筑学 Architecture |
15 15 |
建筑师、产品经理 Architect, Product Manager |
4 4 |
项目规划 Project Planning |
10 10 |
建筑师、项目经理、产品经理 Architect, Project Manager, Product Manager |
5 5 |
管理教育 Management Education |
5 5 |
建筑师、项目经理、产品经理 Architect, Project Manager, Product Manager |
7 7 |
用户体验设计 UX Design |
10 10 |
用户体验/用户界面专家 UX/UI Expert |
8 8 |
开发培训 Dev Training |
5 5 |
建筑师 Architect |
9 9 |
测试计划 Test Plan |
二十五 25 |
测试工程师 Test Engineer |
11 11 |
构建和设置 Build and Setup |
10 10 |
DevOps DevOps |
12 12 |
UI 设计 UI Design |
20 20 |
用户体验/用户界面专家 UX/UI Expert |
十三 13 |
手动的 Manual |
20 20 |
产品经理 Product Manager |
二十五 25 |
数据迁移 Data Migration |
10 10 |
开发人员 Developer |
四十六 46 |
手工抛光 Manual Polishing |
10 10 |
产品经理 Product Manager |
四十七 47 |
系统测试 System Testing |
10 10 |
质量控制 Quality Control |
四十八 48 |
系统推出 System Rollout |
10 10 |
架构师、项目经理、产品经理、DevOps Architect, Project Manager, Product Manager, DevOps |
设计团队要求 20 人组成的小组对整个 TradeMe 项目进行估算。提供的唯一输入是 TradeMe 的静态架构和系统的操作概念。设计团队使用宽带估算技术,得出的结果是持续时间为 10.5 个月,平均员工人数为 7.1 人。这相当于总成本为 74.6 人月。
The design team asked a group of 20 people to estimate the TradeMe project as a whole. The only input provided was the static architecture of TradeMe and the system’s operational concept. The design team used the broadband estimation technique and came up with duration of 10.5 months and average staffing of 7.1 people. This equated to a total cost of 74.6 man-months.
设计团队随后着手确定各种活动之间的依赖关系。TradeMe 的起点是架构和结构组件之间的行为依赖关系。在此基础上,团队添加了非行为依赖关系,例如独立于架构的非编码活动或编码活动。设计团队还利用项目设计模式和合理的复杂性降低技术来简化网络并简化即将到来的项目执行。结果是项目网络的第一次迭代。
The design team then proceeded to determine the dependencies between the various activities. The starting point for TradeMe was the architecture and the behavioral dependencies between the structural components. To those, the team added nonbehavioral dependencies such as noncoding activities or coding activities that were independent of the architecture. The design team also leveraged project design patterns and reasonable complexity reduction techniques both to simplify the network and to ease the upcoming project execution. The result was the first iteration of the project network.
在构建第一组依赖关系时,设计团队检查了用例和支持它们的调用链。对于每个调用链,他们列出链中的所有组件(通常按体系结构层次结构顺序,例如资源在前,客户端在后),然后添加依赖关系。例如,当他们检查AddTradesman 用例(参见图 5-18)时,设计团队发现调用Membership Manager,Regulation Engine因此他们将Regulation Engine作为的前身添加Membership Manager。
When building the first set of dependencies, the design team examined the use cases and the call chains supporting them. For each call chain, they listed all the components in the chain (often in the architecture hierarchy order, such as Resources first and Clients last) and then added the dependencies. For example, when they examined the Add Tradesman use case (see Figure 5-18), the design team observed that the Membership Manager calls the Regulation Engine, so they added the Regulation Engine as a predecessor to the Membership Manager.
从用例中提取依赖关系需要多次传递,因为每个调用链都可能揭示不同的依赖关系。设计团队甚至在调用链中发现了一些缺失的依赖关系。例如,仅基于第 5 章的调用链,Regulation Engine只需要Regulation Access服务。经过进一步分析,设计团队决定也Regulation Engine依赖于Projects Access和Contractors Access。
Distilling dependencies from the use cases required multiple passes, because each call chain potentially revealed different dependencies. The design team even discovered some missing dependencies in the call chains. For example, based solely on the call chains of Chapter 5, the Regulation Engine required only the Regulation Access service. Upon further analysis, the design team decided that Regulation Engine depended on Projects Access and Contractors Access as well.
封装Abstract Manager了常见的工作流管理操作(例如持久性、状态管理)。因此,设计团队在Abstract Manager和 之间添加了依赖关系Workflow Repository。其他Manager本身依赖于Abstract Manager。同样,为所有ManagerAbstract Manager提供了依赖关系。Message Bus
The Abstract Manager encapsulated the common workflow management actions (e.g., persistence, state management). Therefore, the design team added a dependency between the Abstract Manager and the Workflow Repository. The other Managers themselves depended on the Abstract Manager. Similarly, the Abstract Manager provided the Message Bus dependency for all Managers.
由于系统的操作概念,一些代码依赖关系隐含在调用链中。在 TradeMe 中,客户和经理之间的所有通信 (以及管理器与其他管理器之间的)通过消息总线流动,在它们之间创建了操作依赖关系(而非结构依赖关系)。依赖关系表明客户端需要管理器准备好进行测试和部署。
Some code dependencies were implicit in the call chains due to the system’s operational concept. In TradeMe, all communication between Clients and Managers (and between Managers and other Managers) flowed over the message bus, creating an operational (not structural) dependency between them. The dependencies indicated that the Clients needed the Managers ready for test and deployment.
TradeMe 还包含无法直接追溯到系统所需行为或其操作概念的依赖关系。这些都涉及编码和非编码活动。这种依赖关系主要源于公司的开发过程和 TradeMe 的规划假设。例如,新系统必须从旧系统中继承遗留数据。数据迁移需要新资源(数据库)首先完成,因此数据迁移活动依赖于资源。同样,管理器的完成需要Regression Test Harness。此外,在项目设计时,计划仍必须考虑一些剩余的前端活动。最后,公司有自己的发布程序和内部依赖关系,它们被纳入为结束活动之间的依赖关系。
TradeMe also contained dependencies that could not be traced directly to the required behavior of the system or its operational concept. These involved coding and noncoding activities alike. Such dependencies originated mostly with the company’s development process and TradeMe’s planning assumptions. For example, the new system had to carry forward the legacy data from the old system. Data migration necessitated that the new Resources (the databases) complete first, so the data migration activity depended on the Resources. Similarly, the completion of the Managers required the Regression Test Harness. In addition, at the time of the project’s design, the plan still had to account for a few remaining front-end activities. Finally, the company had its own release procedures and internal dependencies, which were incorporated as dependencies between the concluding activities.
在 TradeMe 中,一个核心运营概念是使用消息总线。选择正确的消息总线技术并将消息和合同的详细设计和编码活动与消息总线保持一致至关重要。调用链派生的依赖关系表明项目可以推迟该活动,直到客户和经理Message Bus需要它为止。但是,这样做的风险是开发团队选择的消息总线可能会使先前的设计或实施决策失效。团队决定在项目中首先解决该活动更为安全。Message Bus
In TradeMe, a core operational concept was the use of a message bus. It was crucial to choose the right message bus technology and align the detailed design and coding activities of messages and contracts with the message bus. The call chain–derived dependencies showed that the project could defer the Message Bus activity until it was needed by the Clients and the Managers. However, that ran the risk that the message bus chosen by the development team might invalidate prior decisions about design or implementation. The team decided it was safer to address the Message Bus activity first in the project.
类似的逻辑也适用于安全性。虽然调用链分析表明只有客户和经理需要采取明确的安全措施,但安全性如此重要,以至于项目必须确保Security在所有业务逻辑活动之前完成。这确保了所有活动在需要时都有安全支持,并避免安全性成为事后考虑或后期附加功能。
Similar logic applied to security. While the call chain analysis indicated that only the Clients and the Managers needed to take explicit security actions, security was so important that the project had to assure Security completed before all business logic activities. This ensured all activities had security support if they needed it and avoided security becoming an after-thought or a late-stage add-on.
项目设计团队还覆盖了依赖关系,以便降低新兴网络的复杂性。具体来说,他们改变了以下依赖关系:
The project design team also overrode dependencies so that they could reduce the complexity of the emerging network. Specifically, they changed the following dependencies:
首先实现基础设施。在 TradeMe 中,大多数活动都依赖于实用程序组件,例如Logging。将基础设施(也包括Build)移至项目的开头大大减少了项目中的依赖项数量。它还有一个好处,就是在需要时,所有组件都可以使用基础设施,尤其是那些仅根据调用链没有明显需求的组件。
Implementing infrastructure first. In TradeMe, most activities depended on Utility components such as Logging. Moving the infrastructure (which also included the Build) to the beginning of the project drastically reduced the number of dependencies in the project. It also had the benefit of making the infrastructure available to all components in case the need arose, especially those components that had no obvious need based on the call chains alone.
添加里程碑。即使在项目的早期阶段,设计团队也引入了三个里程碑。SDP Review里程碑结束了前端活动。另外两个里程碑是Infrastructure Complete和Managers Complete:所有开发活动都依赖于基础设施里程碑,所有客户端都依赖于管理器的完成。
Adding milestones. Even at this early stage of the project, the design team introduced three milestones. The SDP Review milestone concluded the front-end activities. The other two milestones were Infrastructure Complete and Managers Complete: All development activities depended on the infrastructure milestone, and all Clients depended on the completion of the Managers.
整合继承的依赖项。设计团队尽可能将依赖项整合为继承的依赖项。例如,即使客户端需要,该依赖项也可通过其管理器Message Bus依赖项继承。
Consolidating inherited dependencies. The design team consolidated dependencies into inherited dependencies where possible. For example, even though the Clients require the Message Bus, that dependency could be inherited via their Manager dependencies.
在初始网络布局完成后,设计团队进行了以下健全性检查:
With the initial network laid out, the design team performed the following sanity checks:
已验证 TradeMe 项目具有单一开始活动和单一结束活动。
Verified the TradeMe project had a single start activity and a single end activity.
验证项目中的每个活动是否都位于关键路径上某处的终止路径上。
Verified that every activity in the project resided on a path ending somewhere on the critical path(s).
已验证初始风险测量产生的风险数字相对较低。
Verified that the initial risk measurement yielded a relatively low risk number.
计算了没有任何资源分配的项目持续时间。结果为 7.8 个月,稍后将作为正常解决方案的重要检查。
Calculated the duration of the project without any resource assignment. This came to 7.8 months and later would serve as important check of the normal solution.
该公司提供了以下规划假设:
The company provided the following planning assumptions:
核心团队。整个项目都需要核心团队。核心团队由一名架构师、一名项目经理和一名产品经理组成。核心团队很少被允许直接参与项目。此类工作包括架构师完成的关键高风险活动以及制作用户手册,后者分配给产品经理。
Core team. The core team was required throughout the project. The core team consisted of a single architect, a project manager, and a product manager. The core team was allowed to work directly on the project only infrequently. Such work included key high-risk activities done by the architect and producing the user manual, which was assigned to the product manager.
接触专家。该项目可以接触到专家或专业人士,例如测试工程师、数据库架构师和 UX/UI 设计师。
Access to experts. The project had access to experts or specialists, such as test engineers, DB architects, and UX/UI designers.
任务分配。开发人员按 1:1 的比例分配服务或其他编码活动。除了根据浮动时间分配外,TradeMe 还尽可能保持任务连续性(参见第 7 章)。
Assignments. There was a 1:1 assignment of developers to services or other coding activities. On top of assigning based on floats, whenever possible, TradeMe maintained task continuity (see Chapter 7).
质量控制。从施工开始到项目结束,只需要一名质量控制测试员。测试员仅在系统测试活动期间被视为直接成本。系统测试活动需要一名额外的测试员。
Quality control. A single quality control tester was required from the start of construction to the end of the project. The tester was treated as a direct cost only during the system test activity. One additional tester was required for the system testing activity.
构建和运营。从建设开始到项目结束,都需要一名构建、配置、部署和 DevOps 专家。
Build and operations. A single build, configuration, deployment, and DevOps specialist was required from start of construction until the end of the project.
开发人员。任务之间的开发人员被视为直接成本,而不是间接成本。TradeMe 的高质量期望消除了系统测试期间对开发人员的需求。
Developers. Developers between tasks were considered to be a direct cost rather than an indirect cost. TradeMe’s high quality expectations eliminated the need for developers during system testing.
表 13-4概述了项目每个阶段所需的角色。
Table 13-4 outlines which roles were required in each phase of the project.
表 13-4项目角色和阶段
Table 13-4 Roles and phases of the project
角色 Role |
前端 Front End |
基础设施 Infrastructure |
服务 Services |
测试 Testing |
|---|---|---|---|---|
建筑师 Architect |
十 X |
十 X |
十 X |
十 X |
专案经理 Project Manager |
十 X |
十 X |
十 X |
十 X |
产品经理 Product Manager |
十 X |
十 X |
十 X |
十 X |
测试人员 Testers |
|
十 X |
十 X |
十 X |
DevOps DevOps |
|
十 X |
十 X |
十 X |
开发人员 Developers |
|
十 X |
十 X |
|
将资源分配给各种活动会影响项目网络。在几个地方,网络除了活动之间的逻辑依赖关系外,还包含对资源的依赖关系。合并继承的依赖关系后,网络图如图13-1所示。
Assigning resources to the various activities affected the project network. In several places, the network included dependencies on the resources in addition to the logical dependencies between the activities. After consolidating the inherited dependencies, the network diagram looked like Figure 13-1.
图 13-1逻辑依赖关系网络图
Figure 13-1 Logical dependencies network diagram
图 13-1包含几个折叠的依赖关系(每个箭头用两个活动数字表示),这简化了图表而不影响其性质。此网络图最值得注意的方面是它包含两条关键路径。
Figure 13-1 contains a couple of collapsed dependencies (indicated by two activity numbers per arrow) that simplified the diagram without affecting its nature. The most notable aspect of this network diagram is that it contains two critical paths.
图 13-2显示了第一个正常解决方案的计划挣值。该解决方案的持续时间为 7.8 个月,表明人员配置没有延长关键路径。图 13-2中的图表具有浅 S 曲线的总体形状,但并不理想。该项目的开始相当顺利,但项目的后半部分并不十分平缓。陡峭的计划挣值曲线也反映在风险值略高的情况中。活动风险和关键性风险均为 0.7。
Figure 13-2 captures the planned earned value of the first normal solution. The duration of this solution stood at 7.8 months, indicating that the staffing assignments had not extended the critical path. The chart in Figure 13-2 has the general shape of a shallow S curve but is not ideal. The project starts reasonably well, but the second half of the project is not very shallow. The steep planned earned value curve was also reflected in the somewhat elevated risk values. Both the activity risk and criticality risk were 0.7.
图 13-2第一个正常解决方案计划挣值
Figure 13-2 The first normal solution planned earned value
图 13-3显示了第一个正态解的人员配置分布图。与计划挣值图一样,图 13-3中的分布是有问题的。项目中心的明显峰值表明存在浪费,并且意味着对人员配置弹性的期望不切实际(参见第 7 章和图7-10)。
Figure 13-3 shows the staffing distribution chart of the first normal solution. As with the planned earned value chart, the distribution in Figure 13-3 is problematic. The distinct peak at the center of the project indicates waste and implies an unrealistic expectation of staffing elasticity (see Chapter 7 and Figure 7-10).
图 13-3第一种常态方案人员配置
Figure 13-3 The first normal solution staffing distribution
根据人员分配,项目总成本为 59 个人月:直接成本 32 个人月,间接成本 27 个人月。直接成本高于间接成本,表明该解决方案很可能位于时间成本曲线的左侧,间接成本仍然较低。
Based on the staffing distribution, the project total cost came to 59 man-months: 32 man-months of direct cost and 27 man-months of indirect cost. The higher direct cost compared to the indirect cost indicated that this solution likely was very much to the left side of the time–cost curve, where the indirect cost is still low.
计算出的项目效率为 32%。由于实际上限为 25%,如此高的效率值得怀疑。综合起来,直接成本高于间接成本、人员配置分布图中的明显峰值以及高效率强烈表明对人员配置弹性的假设过于激进。该解决方案预计,在所有并行网络路径中,资源始终会在正确的时间可用以保持进度。相当陡峭的计划挣值图直观地显示了这一预期。简而言之,首次尝试正常解决方案假设团队非常高效,可能过于高效以至于不切实际。
The calculated project efficiency was 32%. Since the upper practical limit is 25%, such high efficiency was questionable. Taken together, the direct cost higher than the indirect cost, the conspicuous peak in the staffing distribution chart, and the high efficiency strongly indicated overly aggressive assumptions about staffing elasticity. The solution expected that across all the parallel network paths, resources would always be available at the right time to maintain progress. The rather steep planned earned value chart visualized this expectation. In short, this first attempt at the normal solution assumed a very efficient team, likely one too efficient to be practical.
表 13-5总结了第一个正态解决方案的项目指标。
Table 13-5 summaries the project metrics of this first normal solution.
表 13-5第一种正态方案的项目指标
Table 13-5 Project metrics of the first normal solution
项目指标 Project Metric |
价值 Value |
|---|---|
持续时间(月) Duration (months) |
7.8 7.8 |
总成本(人月) Total cost (man-months) |
59 59 |
直接成本(人月) Direct cost (man-months) |
三十二 32 |
人员配置高峰期 Peak staffing |
12 12 |
平均人员配备 Average staffing |
7.5 7.5 |
普通开发人员 Average developers |
3.5 3.5 |
效率 Efficiency |
32% 32% |
活动风险 Activity risk |
0.7 0.7 |
危急风险 Criticality risk |
0.7 0.7 |
下一步是考虑加速项目的选项。由于存在两条关键路径,最好的做法是通过并行工作来压缩这个项目。
The next step was to consider options for accelerating the project. Due to the presence of the two critical paths, the best course of action was to compress this project by enabling parallel work.
从图 13-1中可以明显看出,管理器服务(活动 36、37、38、39)以及Regression Test Harness(活动 40)限制了两条关键路径以及两条近乎关键的路径。而客户端(活动 42、43、44、45)又依赖于所有管理器的完成,从而延长了项目。这使得客户端和管理器成为压缩的自然候选者。
From Figure 13-1, it was evident that the Manager services (activities 36, 37, 38, 39), along with the Regression Test Harness (activity 40), capped the two critical paths, as well as the two near-critical paths. The Clients (activities 42, 43, 44, 45), in turn, depended on the completion of all the Managers, prolonging the project. This made the Clients and the Managers natural candidates for compression.
对于每个管理服务,设计团队添加了以下活动,从而实现了压缩:
For each Manager service, the design team added the following activities, which enabled the compression:
合同设计活动将客户与管理者分离开来。各种合同设计活动也许可以在 SDP 审查后开始,但最好将它们推迟到基础设施完成后。每个合同的预计工作时间:5 天。
A contract design activity that decoupled the Clients from the Manager. The various contract design activities could perhaps have started after the SDP review, but it was deemed better to postpone them until after the infrastructure was complete. Estimated work per contract: 5 days.
管理器模拟器提供了管理器合约的足够好的实现。模拟器必须能够完全开发客户端,而客户端现在依赖于模拟器,而不是实际的管理器。模拟器不依赖于ResourceAccess或Engines等低级服务。模拟器只需要Manager合约来模拟和Message Bus。合约本身依赖于基础设施,其中包括Message Bus。每个模拟器的预计工作量:15 天。
A Manager simulator that provided a good-enough implementation of the Manager’s contract. The simulators had to enable full development of the Clients, which now depended on the simulators, not the actual Managers. The simulators had no dependencies on lower-level services such as ResourceAccess or Engines. The simulators needed only the Manager contracts to simulate and the Message Bus. The contracts themselves depended on the infrastructure, which included the Message Bus. Estimated work per simulator: 15 days.
一项专门的活动,针对管理器集成并重新测试客户端。集成活动取决于实际管理器及其客户端的完成情况。系统测试活动现在不仅需要完成客户端集成,还需要完成所有管理器集成。每个集成活动的估计工作量:5 天。
A dedicated activity that integrated and retested the Clients against the Managers. The integration activity depended on the completion of the actual Manager and its Clients. The system testing activity now required not just the Clients but also all the Manager integrations to be completed. Estimated work per integration activity: 5 days.
图 13-4显示了简化的网络图,其中用红色表示了与压缩相关的活动。在压缩网络中,管理器仅处于近乎关键的位置,并且按照与正常解决方案类似的时间线进行开发。最重要的变化(首先允许压缩)是客户端现在提前一个月完成。但是,由于管理器之后的额外集成活动,项目持续时间的缩短不到一个月。
Figure 13-4 shows a simplified network diagram capturing the compression-related activities in red. In the compressed network, the Managers were only near-critical and were developed on a similar timeline to the normal solution. The most important change (which allowed the compression in the first place) was that the Clients now completed a month sooner. However, the reduction in the duration of the project was less than a month because of the additional integration activities following the Managers.
图 13-4压缩解决方案简化网络图
Figure 13-4 Simplified network diagram for the compressed solution
经理本身的持续时间估计保持不变。在正常解决方案中,每个经理活动都必须在内部包含一些设计服务合同的投资。理论上,一旦设计团队将合同设计从经理中提取到单独的活动中,每个经理应该花费更少的时间。然而,在实践中,这种减少不太可能发生。拆分活动永远不会 100% 高效,并且不可避免地会因为需要了解合同以及它如何影响经理的内部实施而损失一些精力。为了弥补这些缺点,设计团队将经理的持续时间估计与正常解决方案保持一致。
The duration estimation for the Managers themselves remained unchanged. In the normal solution, each Manager activity had to internally include some investment in designing the service contract. In theory, once the design team had extracted the contract design out of the Managers into separate activities, each Manager should have taken less time. However, in practice, this reduction is unlikely. Splitting activities is never 100% efficient, and inevitably some effort is lost due to the need to understand the contract and how it affected the internal implementation of the Managers. To compensate for these shortcomings, the design team kept the duration estimation of the Managers the same as for the normal solution.
压缩解决方案的其余步骤与正常解决方案几乎相同。但是,设计团队发现,他们可以在整个项目中减少两名开发人员,方法是让架构师负责一项开发活动,并将进度推迟一周。考虑到在争取更多开发人员方面面临的挑战,该公司认为,以减少人员换取轻微的延迟是可以接受的。压缩解决方案的持续时间为 7.1 个月,与正常解决方案(7.8 个月)相比,加快了 3 周(9%)。新资源确实消耗了更多的浮动时间,项目的新风险数为 0.74。
The rest of the steps for the compressed solution were virtually identical to those for the normal solution. However, the design team discovered they could reduce the staff by two developers throughout the project by using the architect for one development activity and by pushing the schedule out one week. The company judged trading the slight delay for the reduced staff as acceptable given its challenge in securing more developers. The duration of the compressed solution came in at 7.1 months, a 3-week (9%) acceleration compared with the normal solution (7.8 months). The new resources did consume more of the floats, and the new risk number for the project was 0.74.
图 13-5显示了压缩解决方案的计划挣值。曲线现在在项目结束时略微变细,比正常解决方案中的曲线要好。
Figure 13-5 shows the planned earned value for the compressed solution. The curve now tapers somewhat at the end of the project, better than that in the normal solution.
图片 13-5压缩解决方案的计划获得值
Figure 13-5 Planned earned value for the compression solution
图 13-6显示了压缩解决方案的人员配置分布。人员配置分布大部分看起来都比较稳定。最初从 3 人增加到 12 人有点困难,但还是可以做到的。峰值人员配置为 12 人,与正常解决方案相同。平均人员配置为 8.2 人,而正常解决方案为 7.5 人。
Figure 13-6 shows the staffing distribution of the compressed solution. The staffing distribution looks solid for the most part. The initial ramp-up from 3 to 12 people is a bit challenging, but doable. Peak staffing of 12 is the same as the normal solution. Average staffing is at 8.2, compared with 7.5 of the normal solution.
图 13-6压缩解决方案的人员分配
Figure 13-6 Staffing distribution for the compressed solution
压缩解决方案的成本为 58.5 人月,略低于正常解决方案的成本 59 人月。直接成本为 36.7 人月,而正常解决方案的直接成本为 32 人月。尽管此项目设计解决方案速度更快且成本更低,但与正常解决方案的真正区别在于预期的项目效率 — 37%。如果正常解决方案的 32% 效率需要一支高效的团队,那么压缩解决方案则需要一支英雄团队。再加上 0.74 的高风险,这种压缩解决方案注定会令人失望。
The cost of the compressed solution came in at 58.5 man-months, slightly less than the normal solution’s cost of 59 man-months. Direct cost was 36.7 man-months compared with the normal solution’s direct cost of 32 man-months. Although this project design solution was faster and at lower cost, the real difference from the normal solution was the expected project efficiency—37%. If the normal solution’s efficiency of 32% required a highly efficient team, the compressed solution demanded nothing less than a team of heroes. Combined with the elevated risk of 0.74, this compressed solution was a disappointment waiting to happen.
表 13-6总结了压缩解决方案的指标。压缩解决方案使本来就具有挑战性的项目(见图13-1)更具挑战性,并产生了不切实际的高效率期望。它的主要缺点是然而,问题在于集成,而不是执行复杂性的增加。项目接近尾声时发生的多个并行集成没有提供任何回旋余地。如果其中任何一个出现问题,团队就没有时间进行修复。以不到一个月的压缩时间换取执行复杂性和集成风险的增加并不是一个好的交易。
Table 13-6 summarizes the metrics of the compressed solution. The compressed solution made an already challenging project (see Figure 13-1) more challenging and created an unrealistically high efficiency expectation. Its major downside, however, was the integration—not the increase in execution complexity. The multiple, parallel integrations occurring near the end of the project offered no leeway. If any of them went awry, the team had no time for repairs. The increase in both the execution complexity and the integration risk in exchange for less than a month of compression was not a good trade.
表13-6压缩方案项目指标
Table 13-6 Project metrics of the compressed solution
项目指标 Project Metric |
价值 Value |
|---|---|
持续时间(月) Duration (months) |
7.1 7.1 |
总成本(人月) Total cost (man-months) |
58.5 58.5 |
直接成本(人月) Direct cost (man-months) |
36.7 36.7 |
人员配置高峰期 Peak staffing |
12 12 |
平均人员配备 Average staffing |
8.2 8.2 |
普通开发人员 Average developers |
4.7 4.7 |
效率 Efficiency |
37% 37% |
活动风险 Activity risk |
0.73 0.73 |
危急风险 Criticality risk |
0.75 0.75 |
尽管如此,这次压缩尝试并非浪费时间——它证明了压缩解决方案是徒劳的。压缩解决方案还帮助设计团队更好地了解项目,并在时间成本曲线上提供了另一个点。
Even so, this compression attempt was not a waste of time—it proved the compressed solution would be an exercise in futility. The compressed solution also helped the design team to better understand the project and provided another point on the time–cost curve.
第一个常规解决方案的主要问题不是效率不切实际,而是项目网络的复杂性。只需检查图 13-1中的(已简化的)网络图,就可以明显看出这种复杂性。网络的圈复杂度为 33 个单位。再加上团队期望的高效率,这意味着很高的执行风险。
The main problem with the first normal solution was not the unrealistic efficiency but the complexity of the project network. That complexity is evident just by examining the (already simplified) network diagram in Figure 13-1. The cyclomatic complexity of the network is 33 units. Coupled with the high efficiency expected of the team, this implied a high execution risk.
设计团队没有直面高复杂性,而是选择按架构层重新设计项目,而不是按活动之间的逻辑依赖关系。这主要产生了一系列活动脉冲。脉冲对应于架构的各层或项目的各个阶段:前端、基础设施和基础工作、资源、资源访问、引擎、管理器、客户端和发布活动(图 13-7)。
Instead of confronting the high complexity, the design team chose to redesign the project by architecture layers, as opposed to the logical dependencies between the activities. This produced mostly a string of pulses of activities. The pulses corresponded to the layers of the architecture or the phase of the project: front end, infrastructure and foundational work, Resources, ResourceAccess, Engines, Managers, Clients, and release activities (Figure 13-7).
图 13-7分层设计网络图
Figure 13-7 Design-by-layers network diagram
虽然脉冲是序列化的,并且彼此连续,但在内部,脉冲是并行进行的。在图 13-7中,除了展开的经理脉冲外,所有脉冲都折叠起来。一些剩余的支持活动(例如UI Design和Test Harness)不是脉冲串的一部分,但它们的浮动时间非常高。
While the pulses were serialized and sequential to each other, internally the pulses were done in parallel. In Figure 13-7, all the pulses are collapsed except for the expanded Manager’s pulse. A few remaining support activities, such as UI Design and the Test Harness, were not part of the string of pulses, but they had very high float.
图 13-7的一个显而易见的方面是,与图 13-1相比,该网络有多么简单。由于脉冲在时间上是连续的,项目经理只需应对每个脉冲及其支持活动的复杂性。在 TradeMe 中,单个脉冲的复杂性为 2、4、5、4、4、4、4 和 2。支持活动的复杂性为 1,由于其高浮动,对执行复杂性几乎没有影响。
An instantly noticeable aspect of Figure 13-7 is how simple that network is compared with that of Figure 13-1. Since the pulses were sequential in time, the project manager would only have to contend with the complexity of each pulse and its support activities. In TradeMe, the complexity of the individual pulses was 2, 4, 5, 4, 4, 4, 4, and 2. The complexity of the support activities was 1 and, due to their high float, had essentially no effect on the execution complexity.
如第 12 章所述,分层设计会产生风险更高的项目。设计团队发现,对于 TradeMe,分层设计解决方案的风险为 0.76,高于原始常规解决方案(使用依赖项设计)的 0.7。如果忽略高浮动支持活动,风险甚至会更高,达到 0.79。
As discussed in Chapter 12, design by layers yields riskier projects. The design team found that with TradeMe, the risk of the design-by-layers solution was 0.76, up from the 0.7 of the original normal solution (which used design by dependencies). The risk went even higher to 0.79 when ignoring the high-float support activities.
图 13-8显示了分层设计解决方案的计划人员分配情况。人员分配图的整体形状令人满意。该项目只需要 4 名开发人员,人员最多时达到 11 人。
Figure 13-8 shows the planned staffing distribution for the design-by-layers solution. The overall shape of the staffing distribution chart was satisfactory. The project needed only 4 developers, and staffing peaked at 11 people.
图 13-8分层设计人员分布
Figure 13-8 Design-by-layers staffing distribution
表 13-7显示了分层设计 TradeMe 的项目指标。
Table 13-7 shows the project metrics for designing TradeMe by layers.
表 13-7分层设计解决方案的项目指标
Table 13-7 Project metrics for the design-by-layers solution
项目指标 Project Metric |
价值 Value |
|---|---|
持续时间(月) Duration (months) |
8.1 8.1 |
总成本(人月) Total cost (man-months) |
60.8 60.8 |
直接成本(人月) Direct cost (man-months) |
32.2 32.2 |
人员配置高峰期 Peak staffing |
11 11 |
平均人员配备 Average staffing |
7.5 7.5 |
普通开发人员 Average developers |
3.4 3.4 |
效率 Efficiency |
31% 31% |
活动风险 Activity risk |
0.75 0.75 |
危急风险 Criticality risk |
0.76 0.76 |
分层设计解决方案需要四名开发人员。该公司担心如果无法找到这四名开发人员会发生什么。因此,调查亚临界的影响非常重要。规划假设仍然允许接触外部专家。
The design-by-layers solution called for four developers. The company was concerned about what would happen if it was unable to get those four developers. It was therefore important to investigate the implications of going subcritical. The planning assumptions still allowed for access to external experts.
对于这个项目,任何少于四名开发人员的分层设计解决方案都将成为次临界解决方案,因此设计团队选择探索双开发人员解决方案。这些开发人员也被分配了数据库设计。次临界网络图类似于图 13-7中的网络图,只是内部每个脉冲仅由两个并行的活动串组成。
For this project, any design-by-layers solution with fewer than four developers became subcritical, so the design team chose to explore a two-developer solution. These developers were assigned the database design as well. The subcritical network diagram was similar to the one in Figure 13-7 except that internally each pulse consisted of only two parallel strings of activities.
亚临界解决方案将项目延长至11.1个月。计划挣值曲线(如图13-9所示)几乎是一条直线,其线性回归趋势线的R2为0.98。
The subcritical solution extended the project to 11.1 months. The planned earned value curve (shown in Figure 13-9) was almost a straight line whose linear regression trend line had an R2 of 0.98.
图 13-9亚临界计划进度
Figure 13-9 Subcritical planned progress
该解决方案的风险指数为 0.84,这也反映出其亚临界性质。如果公司不得不采用这一方案,设计团队建议将项目时间缩短至少一个月。缩短时间将工期推至 12 个月,比分层设计解决方案长 50% 或更多。
The subcritical nature of the solution was also reflected by its risk index of 0.84. If the company had to pursue this option, the design team recommended decompressing the project by at least a month. Decompression pushed the schedule into the 12-month range, 50% or more longer than the design-by-layers solution.
亚临界解决方案的总成本为 74.1 人月,直接成本为 30.4 人月,预期效率为 25% 更为合理。人员分布图(未显示)缺少中心的驼峰,这是亚临界解决方案的典型特征(参见第 7 章)。
The total cost of the subcritical solution was 74.1 man-months, with a direct cost of 30.4 man-months, and the expected efficiency was more reasonable at 25%. The staffing distribution chart (not shown) was missing the hump in the center, as is typical for subcritical solutions (see Chapter 7).
表 13-8显示了亚临界解决方案的项目指标。
Table 13-8 shows the project metrics for the subcritical solution.
表 13-8亚临界解决方案的项目指标
Table 13-8 Project metrics for the subcritical solution
项目指标 Project Metric |
价值 Value |
|---|---|
持续时间(月) Duration (months) |
11.1 11.1 |
总成本(人月) Total cost (man-months) |
74.1 74.1 |
直接成本(人月) Direct cost (man-months) |
30.4 30.4 |
人员配置高峰期 Peak staffing |
9 9 |
平均人员配备 Average staffing |
6.7 6.7 |
普通开发人员 Average developers |
2 2 |
效率 Efficiency |
25% 25% |
活动风险 Activity risk |
0.85 0.85 |
危急风险 Criticality risk |
0.82 0.82 |
亚临界时间和成本指标(11.1 个月和 74.1 人月)与总体估算(10.5 个月和 74.6 人月)相比更为有利,持续时间相差约 5%,成本相差不到 1%。这种相关性表明亚临界解决方案数字是该项目的可能选择。更现实的 25% 效率也使亚临界解决方案更具可信度。
The subcritical time and cost metrics (11.1 months and 74.1 man-months) compared favorably to those for the overall estimation (10.5 months and 74.6 man-months), differing by about 5% in duration and less than 1% in cost. This correlation suggested that the subcritical solution numbers were the likely option for the project. The more realistic 25% efficiency also gave credence to the subcritical solution.
分析表 13-5和表 13-7的结果,可以发现几个有说服力的观察结果。首先,无论团队采用分层设计还是依赖关系设计,项目持续时间基本保持不变。如第 12 章所述,这种相似性是意料之中的。毕竟,基于调用链的依赖关系主要是层的产物,项目持续时间由通过层的最长路径决定。此外,开发人员的平均人员配备水平和效率没有变化。主要差异与分层设计解决方案大幅降低的执行复杂性和更高的风险有关。
Analyzing the results of Table 13-5 and Table 13-7 revealed several telling observations. First, the project duration remained largely the same regardless of whether the team used design by layers or design by dependencies. As explained in Chapter 12, this similarity was expected. After all, call chain–based dependencies are principally a product of the layers, and the project duration is dictated by the longest path through the layers. Also, the average staffing level of developers and efficiency were unchanged. The main differences related to the drastically reduced execution complexity and the higher risk with the design-by-layers solution.
简而言之,对于 TradeMe 来说,分层设计在各个方面都与第一种常规解决方案相当甚至更好,除了风险。即使分层设计解决方案成本更高、耗时更长,但其执行简单性使其成为 TradeMe 的不二之选。分层设计解决方案也远胜于由此衍生的亚临界解决方案。亚临界解决方案成本更高、耗时更长、风险更大。设计团队采用分层设计解决方案作为其余分析的常规解决方案。
In short, for TradeMe, design by layers was comparable to or better than the first normal solution in every respect except risk. Even if the design-by-layers solution had cost more and taken longer, its execution simplicity made it the obvious choice for TradeMe. The design-by-layers solution was also far better than the subcritical solution derived from it. The subcritical solution cost more, took longer, and was riskier. The design team adopted the design-by-layers solution as the normal solution for the remainder of the analysis.
此时,设计团队已经提出了构建系统的四种解决方案:压缩解决方案、按依赖关系的常规解决方案、按分层的常规解决方案以及分层设计解决方案的亚临界选项。由于亚临界解决方案是分层设计解决方案的后备方案,因此设计团队将其排除在风险分析之外。
At this point the design team had produced four solutions for building the system: the compressed solution, the normal solution by dependencies, the normal solution by layers, and the subcritical option of the design-by-layers solution. Since the subcritical solution was a fallback position for the design-by-layers solution, the design team excluded it from the risk analysis.
分层设计方案的风险和临界脉冲较高,设计团队通过风险减压来缓解这些风险。由于不知道合适的减压量,设计团队尝试了 1 周、2 周、4 周、6 周和 8 周的减压,并观察了风险行为。表 13-9显示了三个设计方案和五个减压点的风险值。
The design-by-layers solution had elevated risk and critical pulses, which the design team mitigated by using risk decompression. Since the appropriate amount of decompression was unknown, the design team tried decompressing by 1 week, 2 weeks, 4 weeks, 6 weeks, and 8 weeks, and observed the risk behavior. Table 13-9 shows the risk values of the three design options and the five decompression points.
表 13-9各选项及减压点的风险值
Table 13-9 Risk values for the options and decompression points
选项 Option |
持续时间(月) Duration (months) |
危急风险 Criticality Risk |
活动风险 Activity Risk |
|---|---|---|---|
压缩 Compressed |
7.1 7.1 |
0.75 0.75 |
0.73 0.73 |
按依赖项设计 Design by Dependencies |
7.8 7.8 |
0.70 0.70 |
0.70 0.70 |
分层设计 Design by Layers |
8.1 8.1 |
0.76 0.76 |
0.75 0.75 |
D1 D1 |
8.3 8.3 |
0.60 0.60 |
0.65 0.65 |
D2 D2 |
8.5 8.5 |
0.48 0.48 |
0.57 0.57 |
D3 D3 |
9.0 9.0 |
0.42 0.42 |
0.46 0.46 |
D4 D4 |
9.4 9.4 |
0.27 0.27 |
0.39 0.39 |
D5 D5 |
9.9 9.9 |
0.27 0.27 |
0.34 0.34 |
图 13-10将这些选项和减压点与时间线绘制在一起。临界风险的表现符合预期,风险随着减压而下降,并遵循某种逻辑函数。活动风险也随着减压而下降,但两条曲线之间出现了差距,因为活动风险模型对浮标分布不均匀的反应不佳。产生表 13-9中数值的计算通过调整浮标异常值(如第 11 章所述)解决了这个问题——即用浮标的平均值加上浮标的一个标准差替换异常值。在这种情况下,调整根本不够。半个标准差的浮标调整可以完美地对齐曲线。然而,设计团队选择只使用临界风险曲线,这不需要任何调整。团队观察到,D4由于风险曲线正在趋于平稳,因此减压过度是过度的。
Figure 13-10 plots these options and decompression points against the timeline. The criticality risk behaved as expected, and the risk dropped with decompression along some logistic function. The activity risk also dropped with decompression, but a gap appeared between the two curves because the activity risk model did not respond well to an uneven distribution of the floats. The calculations that produced the values in Table 13-9 addressed this issue by adjusting the float outliers as described in Chapter 11—that is, by replacing the outliers with the average of the floats plus one standard deviation of the floats. In this case, the adjustment was simply insufficient. A float adjustment at half a standard deviation aligned the curves perfectly. However, the design team chose to just use the criticality risk curve, which did not require any adjustments. The team observed that decompression beyond D4 was excessive because the risk curve was leveling out.
图片 13-10离散风险曲线
Figure 13-10 Discrete risk curves
利用表 13-9中的值,设计团队找到了风险曲线的多项式相关性模型,R2为0.96:
With the values in Table 13-9, the design team found a polynomial correlation model for the risk curve with R2 of 0.96:
以月为单位t。
where t is measured in months.
使用风险模型,最大风险为 7.4 个月,风险值为 0.78。这个点介于按依赖性设计解决方案的 7.8 个月和压缩解决方案的 7.1 个月之间(见图13-11)。设计团队将压缩解决方案排除在考虑范围之外,因为它已经超过了最大风险点。即使是按依赖性设计解决方案,风险也处于临界点:7.8个月时,风险已经达到推荐的最大值 0.75。分层设计解决方案的风险为 0.68,比较舒适。最小风险点位于 9.7 个月,风险值为 0.25。
Using the risk model, maximum risk was at 7.4 months, with a risk value of 0.78. This point was between the deign-by-dependencies solution’s 7.8 months and the compressed solution’s 7.1 months (see Figure 13-11). The design team removed the compressed solution from consideration because it was past the point of maximum risk. Even the design-by-dependencies solution was borderline risk-wise: At 7.8 months, the risk was already 0.75, the maximum recommended value. The design-by-layers solution was at a comfortable 0.68 risk. The point of minimum risk was at 9.7 months with a risk value of 0.25.
图片 13-11风险模型曲线和兴趣点
Figure 13-11 Risk model curve and points of interest
表 13-10捕获了这些点的风险值,图 13-11沿着风险模型曲线对它们进行了可视化。
Table 13-10 captures the risk value of these points, and Figure 13-11 visualizes them along the risk model curve.
表 13-10风险模型值和兴趣点
Table 13-10 Risk model values and points of interest
选项 Option |
持续时间(月) Duration (months) |
风险模型 Risk Model |
|---|---|---|
压缩 Compressed |
7.1 7.1 |
0.75 0.75 |
最大风险 Maximum Risk |
7.4 7.4 |
0.78 0.78 |
按依赖项设计 Design by Dependencies |
7.8 7.8 |
0.74 0.74 |
分层设计 Design by Layers |
8.1 8.1 |
0.68 0.68 |
最低直接成本 Minimum Direct Cost |
8.46 8.46 |
0.56 0.56 |
D2 D2 |
8.53 8.53 |
0.53 0.53 |
最低减压目标 Minimum Decompression Target |
8.6 8.6 |
0.52 0.52 |
D3 D3 |
9.0 9.0 |
0.38 0.38 |
最低风险 Minimum Risk |
9.7 9.7 |
0.25 0.25 |
使用第 12 章中解释的技术,设计团队计算出 8.6 个月时的最小风险减压目标(风险曲线的二阶导数为零),风险值为 0.52。该点位于D2和D3减压点之间(见图13-10),其右侧的点为D3推荐的减压目标。根据风险模型,持续时间内的风险D3为 0.38,略低于 的实际值 0.42 D3。虽然减压目标的风险值可能看起来很低(明显低于理想的 0.5),但它符合第 12 章中的建议,即将分层设计项目减压至 0.4 以补偿其固有风险。
Using the technique explained in Chapter 12, the design team calculated the minimum risk decompression target (where the risk curve’s second derivative is zero) at 8.6 months, with a risk value of 0.52. This point lay between the D2 and D3 decompression points (see Figure 13-10), making the point to its right, D3, the recommended decompression target. The risk at the duration of D3 was 0.38 on the risk model, slightly less than the actual value of 0.42 for D3. While the risk value for the decompression target may seem low (significantly less than the ideal 0.5), it was in line with the recommendation in Chapter 12 to decompress design-by-layer projects to 0.4 to compensate for their inherent risk.
寻找减压目标的最后一项技术是计算最小直接成本点。然而,减压点的直接成本是未知的。
The last technique put to bear on finding the decompression target was calculating the point of minimum direct cost. However, the direct cost at the decompression points was unknown.
通过查看图 13-8和表 13-7,设计团队保守估计,减压需要四名开发人员中的三名在减压期间继续工作。这使团队能够计算出将项目延伸到D5减压点的直接成本。设计团队将额外的直接成本添加到分层设计解决方案的已知直接成本中,从而提供了直接成本曲线和拟合良好的相关模型:
Examining Figure 13-8 and Table 13-7, the design team conservatively estimated that the decompression required three out of the four developers to keep working during the decompression. This allowed the team to calculate the direct cost for extending the project to the D5 decompression point. The design team added that extra direct cost to the known direct cost of the design-by-layers solution, which provided a direct cost curve and a well-fitted correlation model:
设计团队利用直接成本公式,发现直接成本最小点位于 之前 8.46 个月D2。将 8.46 个月的持续时间代入风险公式,得出的风险为 0.56。直接成本模型的最小点与风险模型二阶导数的零点之间的持续时间差异为 1%,确认D3为减压目标。顺便说一句,最小直接成本为 31.4 人月,而 的直接成本为D332.2 人月,差异仅为 3%。
Using the direct cost formula, the design team found the point of minimum direct cost at 8.46 months, right before D2. Substituting the 8.46-month duration into the risk formula provided a risk of 0.56. The duration difference between the minimum point of the direct cost model and the zero point of the second derivative of the risk model was 1%, confirming D3 as the decompression target. Incidentally, the minimum direct cost was 31.4 man-months, while the direct cost at D3 was 32.2 man-months, a difference of merely 3%.
建议D3要求设计团队提供当时的总成本。虽然从之前的公式中可以知道直接成本,但在整个减压范围内的间接成本是未知的。设计团队对三种已知解决方案的间接成本进行了建模,得到了一条简单的直线,由以下公式描述:
Recommending D3 required the design team to provide the total cost at that point. While the direct cost was known from the prior formula, the indirect cost was unknown across the decompression range. The design team modeled the indirect cost for the three known solutions, obtaining a simple straight line described by the following formula:
设计团队将直接和间接成本方程加在一起,得出了系统总成本的公式:
The design team added the direct and indirect cost equations together to come up with the formula for the total cost in the system:
使用此公式,总成本D3为 67.6 个人月。
Using this formula, the total cost at D3 was 67.6 man-months.
到目前为止,最好的项目设计方案是D3,从分层设计方案中解放出来,只需一个月的时间。它提供了一个简单、可实现的项目,降低了风险,并且直接成本几乎降到了最低。间接成本略低,使该解决方案从持续时间、成本和风险角度来看成为项目的最佳选择。
The best project design option so far was D3, the one-month decompression from the design-by-layers solution. It provided a simple, achievable project at reduced risk and virtually at minimum direct cost. The somewhat low indirect cost made this solution the optimal option for the project from duration, cost, and risk perspectives.
除了这个最佳点之外,设计团队还向公司的决策者展示了依赖关系设计解决方案。它表明,由于复杂性高和团队预期效率不切实际,任何缩短进度的尝试都会大大增加设计风险和执行风险。
In addition to this optimal point, the design team presented the design-by-dependencies solution to the company’s decision makers. It demonstrated that any attempt of decreasing the schedule would drastically increase the design risk and the execution risk due to high complexity and the unrealistic expected efficiency of the team.
由于可能存在资源短缺,设计团队发现有必要采用亚临界解决方案,但必须进行充分减压。重复与分层设计解决方案类似的步骤,减压亚临界解决方案的风险为 0.47,持续时间为 11.8 个月,总成本为 79.5 人月。提出减压亚临界解决方案既是为了展示项目人员不足的后果,也是为了表明如果有需要,该项目仍然可行。
Because of the potential resource shortage, the design team found it necessary to include the subcritical solution, but only with adequate decompression. Repeating similar steps as for the design-by-layers solution, the decompressed subcritical solution provided a risk of 0.47, a duration of 11.8 months, and a total cost of 79.5 man-months. The decompressed subcritical solution was presented both to show the consequences of understaffing the project and to show that the project was still feasible, if the need should arise.
由于风险较高,分层设计和亚临界解决方案的非减压方案没有意义。表 13-11总结了设计团队在 SDP 审查中提出的项目设计方案。
Due to their higher risk, there was no point in considering the non-decompressed options of the design-by-layers and subcritical solutions. Table 13-11 summarizes the project design options that the design team presented at the SDP review.
表 13-11可行的项目设计选项
Table 13-11 Viable project design options
项目选项 Project Option |
持续时间(月) Duration (months) |
总成本(人月) Total Cost (man-months) |
风险 Risk |
复杂 Complexity |
|---|---|---|---|---|
活动驱动 Activity Driven |
8 8 |
61 61 |
0.74 0.74 |
高的 High |
架构驱动 Architecture Driven |
9 9 |
68 68 |
0.38 0.38 |
低的 Low |
人手不足 Understaffed |
12 12 |
80 80 |
0.47 0.47 |
低的 Low |
为了便于演示,设计团队重新命名了设计选项,以避免使用诸如“正常”、“减压”、“亚临界”和“分层”等项目设计术语。在表 13-11中,标签“活动驱动”代表按依赖项设计,“架构驱动”代表按分层设计,“人手不足”代表亚临界。
For the presentation, the design team renamed the design options to avoid project design jargon such as “normal,” “decompression,” “subcritical,” and “by layers.” In Table 13-11, the label “Activity Driven” stands for design by dependencies, “Architecture Driven” stands for design by layers, and “Understaffed” stands for subcritical.
该表格使用“高”和“低”等通俗易懂的术语来表示复杂性,并对风险值以外的所有数字进行了四舍五入。该表格温和地促使决策者采用解压式分层设计解决方案。
The table used plain-language terms such as “High” and “Low” for complexity and rounded all numbers other than the risk values. The table gently prodded the decision makers toward the decompressed design-by-layers solution.
前面的章节重点介绍了项目设计的技术方面。当然,你可以将项目设计视为一项技术设计任务。在从事项目设计数十年后,我发现它实际上是一种心态,而不仅仅是一种专业知识。你不应该只是计算风险或成本,然后努力履行承诺。你必须努力在项目的每个方面都占据绝对优势。你应该为项目可能给你带来的一切做好准备——这需要超越机制和数字。你应该采取一种整体方法,包括你的个性和态度、你与管理层和开发人员的互动方式,以及对设计对开发过程和产品生命周期的影响的认识。我在本书的两个部分中为系统和项目设计提出的理念为软件工程打开了一扇通往卓越水平的大门。你必须保持这扇大门的开放,不断改进,完善这些想法,发展自己的风格,并适应变化。本章的最后一章建议了你应该如何处理这些方面,但更重要的是向你展示了如何继续这段旅程。
The previous chapters focused on the technical aspects of designing a project. Certainly, you can view project design as a technical design task. After practicing project design for decades, I find that it is actually a mindset, not just an expertise. You should not simply calculate the risk or the cost and try to meet your commitments. You must strive for a complete superiority over every aspect of the project. You should prepare mitigations for everything the project can throw at you—which requires going beyond the mechanics and the numbers. You should adopt a holistic approach that involves your personality and attitude, how you interact with management and the developers, and the recognition of the effect that design has on the development process and the product life cycle. The ideas I have laid out for system and project design in both parts of this book open a portal to a parallel level of excellence in software engineering. It is up to you to keep that portal open, to keep improving, to refine these ideas, to develop your own style, and to adapt. This concluding chapter advises how you should approach these aspects, but more importantly shows you how to continue the journey.
对于何时设计项目这个问题,有几种答案。一种直接的回答是“始终”。与大多数软件项目的惨淡状况相比,项目设计所能提供的东西显然相当引人注目。
There are several answers to the question of when to design a project. One straightforward response is “always.” Compared with the dismal state of affairs for most software projects, what project design has to offer is understandably quite compelling.
作为一名工程师,我对“从不”和“总是”这样的绝对答案持谨慎态度。你应该从投资回报率的角度来回答何时设计项目的问题。将设计项目的时间和成本与以最快、最便宜和最安全的方式构建系统的好处进行比较。由于设计一个项目只需要几天到一周的时间,从投资回报率的角度来看,很容易证明设计大多数项目的合理性。此外,项目范围越大,你就应该在能给你提供最佳解决方案的项目设计上投入越多。对于一个大型且昂贵的项目,即使是与设计方案的微小变化最佳点的绝对值可能非常大,并且可能超过项目设计成本。
As an engineer, I am wary of absolute answers like “never” and “always.” You should answer the question of when to design a project from ROI perspective. Compare the time and cost of designing a project with the benefits of building the system in the fastest way, the least costly way, and the safest possible way. Since it takes just a few days to a week to design a project, from an ROI perspective it is easy to justify designing most projects. Furthermore, the larger the scope of the project, the more you should invest in project design that gives you the optimal solution. With a large and expensive project, even a minute change from the optimal point could be both huge in absolute terms and likely to surpass the cost of designing the project.
对于何时设计项目这个问题的另一个答案是“只要你有一个紧迫的最后期限。”即使没有压缩,仅仅让最高效的团队沿着普通解决方案的关键路径分配就胜过任何其他方法,尤其是与尝试迭代构建系统的项目相比。
Another answer to the question of when to design a project is “whenever you have an aggressive deadline.” Even without compression, merely having the most efficient team assigned along the critical path of a plain normal solution will beat any other approach, especially compared with projects that attempt to build the system iteratively.
何时设计项目这个问题的最终答案是整本书中最重要的部分。想象一下,你有一个关于下一个杀手级应用的想法,这个应用可能会非常成功。你需要一些资金来构建它,以支付从雇佣员工到支付云计算时间的费用。你可以寻求风险投资以换取大部分股权,然后每周工作 60 小时,持续数年,从事一件很可能会失败的事情。你也可以自己资助这个项目:你可以卖掉你的房子,变现你的养老金计划和毕生积蓄,并向朋友和家人借钱。
The final answer to the question of when to design a project is the most important section in this entire book. Imagine you have an idea for the next killer app, something that could be immensely successful. You need some capital to build it, to cover costs from hiring the people to paying for cloud compute time. You could seek venture capital in exchange for most of the equity and then work 60 hours a week for several years on something that is likely to fail. You could also self-fund the project: You could sell your house, liquidate your pension plan and life savings, and borrow from friends and family.
如果您选择自筹资金,您会投资于项目设计吗?这项投资是时间和精力上的小投资还是大投资?您会说您没有时间进行项目设计吗?您会说最好先开始构建一些东西,然后再弄清楚,还是会尽一切努力在破产和贫困之前找出项目是否负担得起?您会跳过项目设计的任何技术或分析吗?即使您能负担得起项目,您仍然会设计项目以确定风险排除区吗?您会再次重复所有计算以确保万无一失吗?您会先设计项目,看看是否应该卖掉房子并辞职吗?毕竟,如果项目需要 300 万美元,而您只能筹集 200 万美元,那么您应该保留房子,而不是新创业公司。项目持续时间也是如此。如果您只有一年的营销窗口,而项目实际上是一个为期两年的项目,那么您什么也不要做。当自筹资金时,您是否也不希望您的开发人员按照项目的详细装配说明进行工作,而不是浪费您微薄的资源试图自己解决问题?
If you choose the self-funding route, would you invest in project design? Would this investment be a little investment in time and effort or a large one? Would you say that you do not have time for project design? Would you say that it is better to just start building something and figure things out later, or will you do whatever it takes to find out if the project is affordable before becoming broke and destitute? Would you skip any of the techniques or analysis of project design? Even if you can afford the project, would you not still design the project to identify the risk exclusion zones? Would you repeat all the calculations a second time for good measure? Would you first design the project to see if you should sell your house and quit your job? After all, if the project requires $3 million and you were able to muster only $2 million, you should keep the house, not the new startup. The same goes for the duration of the project. If you have only a one-year marketing window and the project is really a two-year project, then you should do nothing. When self-funded, would you also not prefer that your developers work against detailed assembly instructions of the project, as opposed to wasting your scant resources trying to figure it out on their own?
接下来,设想一个项目,如果未能履行承诺,经理将承担个人责任。经理在履行承诺时不会获得丰厚的奖金,但如果未能履行承诺,经理必须自掏腰包支付项目成本超支,甚至损失销售额,以及任何合同义务。在这种情况下,经理会反对项目设计还是坚持吗?经理会因为“这不是我们在这里做事的方式”而抵制项目设计吗?经理会在系统和项目设计上投入少量或大量资金,以确保承诺与团队能够生产的东西相一致吗?经理会避免找出死亡地带吗?经理会放弃确保项目设计本身不会发生太大变化的可靠架构吗?经理会说,既然没有人以这种方式工作,那么这就是不设计项目的充分理由吗?
Next, imagine a project where the manager is held personally liable for any failure to meet the commitments. Instead of the manager earning a nice bonus when meeting the commitments, in the case of failure the manager has to pay out of pocket for the project cost overruns, if not the lost sales, as well as any contractual obligations. In such a situation, would the manager oppose project design or insist on it? Would the manager resist project design because “that is not how we do things here”? Would the manager invest a little or a lot in system and project design to ensure the commitments are aligned with what the team can produce? Would the manager avoid finding out where the death zone is? Would the manager give up on sound architecture that will ensure the project design itself will not change much? Would the manager say that since no one is working this way, that is a good enough reason not to design the project?
这种矛盾十分明显。当公司付钱时,大多数人都表现出冷酷、傲慢和自满的态度。大多数人避免独立思考,因为教条地遵循失败行业的普遍做法,并以此作为浪费他人金钱的借口要容易得多。大多数人只是找借口,比如他们没有时间,或者项目设计是一个错误的流程,或者项目设计过度。然而,当他们的头被砍掉时,同样的人会成为项目设计的狂热分子。这种行为上的差异是缺乏诚信的直接结果,无论是个人诚信还是职业诚信。何时设计项目这个问题的真正答案是当你有诚信的时候。
The dissonance is stark. Most people have a callous, cavalier, and complacent attitude when the company is paying. Most people avoid thinking for themselves because it is so much easier to dogmatically follow the common practices of a failing industry and use that as an excuse when squandering other people’s money. Most just make excuses such as that they do not have the time, or that project design is the wrong process, or that project design is over-engineered. Yet when their head is on the chopping block, the same people become project design zealots. Such a difference in behavior is a direct result of lack of integrity, both personal and professional. The real answer to the question of when to design a project is when you have integrity.
我能给你的最好的职业建议是:
The best career advice I can give you is this:
把公司的钱当成自己的钱。
Treat the company’s money as your own.
其他的都不重要。大多数经理无法区分优秀的设计和糟糕的设计,所以他们绝不会仅根据架构来提拔或奖励你。但是,如果你把公司的钱当作自己的钱,如果你彻底设计项目以找到最经济、最安全的系统构建方式,如果你断然拒绝任何其他行动方案,高层会注意到的。通过对公司的钱表现出最大的尊重,你将赢得他们的尊重,因为尊重总是相互的。相反,人们不会尊重那些对他们不尊重的人。当你对自己的行为和决定负责时,你在高层眼中的价值将大幅提升。如果你一再履行承诺,你就会赢得高层的信任。当下一次机会来临时,他们会把它交给他们信任的那个尊重他们时间和金钱的人:你。
Nothing else really matters. Most managers cannot tell the difference between a great design and a horrible design, so they will never promote or reward you based on architecture alone. However, if you treat the company’s money as your own, if you thoroughly design the project to find the most affordable and safest way of building the system, and if you flat out refuse any other course of action, the higher-ups will notice. By showing the utmost respect for the company’s money, you will earn their respect, because respect is always reciprocal. Conversely, people do not respect those who are disrespectful toward them. When you are accountable for your actions and decisions, your worth in the eyes of top management will drastically increase. If you repeatedly meet your commitments, you will earn the trust of the top brass. When the next opportunity comes, they will give it to the one person whom they trust to be respectful of their time and money: you.
这条建议来自我自己的职业生涯。30 岁之前,我领导了硅谷一家财富 100 强公司的软件架构团队,硅谷是软件行业全球竞争最激烈的地方。我的晋升之路与我的经历几乎没有什么关系。这与我的架构能力有关(正如所讨论的,这几乎没什么用)。然而,我确实总是将我的系统设计与项目设计捆绑在一起,这带来了很大的不同。在我看来,公司的钱就是我的钱。
This advice is drawn from my own career. Before I was 30 years old, I led the software architecture group of a Fortune 100 company in Silicon Valley, the most competitive place in the world for the software industry. My rise to the top had little to do with my architecture prowess (as discussed, that hardly ever amounts to much). I did, however, always bundle my system design with project design, and that made all the difference. In my mind, the company’s money was my money.
不要设计时钟。
Do not design a clock.
经过多年对软件项目的失望和幻灭之后,那些第一次接触到项目设计理念的人被它的精确性所吸引,被它的工程原理所吸引。他们试图在每次计算中追求每一个数字,并改进每一个假设和估计,从而错过了合理项目设计的要点。项目设计最重要的作用是做出明智的项目决策:是否继续进行,如果继续,选择哪种方案。您选择的项目设计方案总是与现实不同,实际的项目执行将类似,但与您设计的并不完全相同。项目经理必须通过频繁跟踪项目与计划的对比情况并采取纠正措施来跟进项目设计(参见附录 A)。
After years of disappointments and disillusion from software projects, those exposed for the first time to the ideas of project design are captivated by its precision and fascinated by its engineering principles. They are tempted to go after every last digit in every calculation and to refine every last assumption and estimation, thereby missing the point of sound project design. The most important thing that project design enables is making educated decisions about the project: whether to proceed at all, and if so, under which option. The project design option you choose will always differ from reality, and the actual project execution will be similar, but not quite what you have designed. The project manager must follow up on the project design by frequently tracking the project against the plan and taking corrective actions (see Appendix A).
即使是最好的项目设计方案也只能在执行过程中为您提供一个机会,仅此而已。请注意,此处的“最佳”是指最符合您的团队生产能力(就时间、成本和风险而言)的设计,而不一定是最佳设计。
Even the best project design solution just gives you a fighting chance during execution—nothing more. Note that “best” in this context means a design that is the most calibrated to what your team can produce (in terms of time, cost, and risk), not necessarily the optimal design.
将项目设计视为日晷,而不是时钟。日晷是一种非常简单的设备(一根垂直的棍子插在地上),但它足以精确到分钟地报时(如果你知道日期和纬度)。时钟可以精确到秒地报时,但它是一种更为复杂的设备,其中每个内部细节都必须完美调整才能正常工作。类似地,你的项目设计工作只需要足够好,大致说明可以承诺什么。每个细节都完美对齐的最佳精确解决方案很好,但必须有一个正常、可行的解决方案。
Think of project design as a sundial, rather than a clock. A sundial is an extremely simple device (a vertical stick in the ground), but it is good enough to tell the time down to the minute (if you know the date and the latitude). A clock can tell the time down to the second, but it is a far more intricate device in which every internal detail has to be perfectly tuned for it to work at all. By analogy, your project design effort needs to be only good enough to tell roughly to what it is possible to commit. Optimal precise solutions where every last detail is perfectly aligned are nice, but a normal, doable solution is a must.
切勿设计没有坚实的架构来涵盖波动性的项目。
Never design a project without a solid architecture that encapsulates the volatilities.
如果没有正确的系统架构,系统设计在某个时候就会发生变化。这些变化意味着您将构建一个不同的系统,这将使项目设计无效。一旦发生这种情况,您在项目开始时是否有最好的项目设计就无关紧要了。正如本书第一部分所述,您需要投入时间来处理波动,无论您是否使用方法的结构来处理。
Without the correct system architecture, at some point the system design will change. Those changes mean that you will be building a different system which will void the project design. Once that happens, it does not matter if you had the best project design at the beginning of the project. As prescribed in the first part of the book, you need to invest the time to deal with the volatilities, whether or not you use the structure of The Method to do so.
与架构不同,估算和特定资源对于良好的项目设计而言是次要的。网络的拓扑结构(源自架构)决定了项目的持续时间,而不是开发人员的能力,或者在某种程度上,个人估算的变化。与现实存在很大差异的估算可能会对项目产生巨大影响。但是,只要估算或多或少是正确的,那么实际持续时间稍大或稍小都无关紧要。对于一个体面的项目,您将有数十项活动,这些活动的个别估算可能会朝任何一个方向偏离。总体而言,这些偏移往往会相互抵消。开发人员的能力也是如此。如果您拥有世界上最差或最好的开发人员,这会产生巨大的差异,但只要您拥有优秀的开发人员,事情就会平衡。在提出项目设计理念、识别约束和解决陷阱方面发挥创造力比让每个估算都完全正确更重要。
Unlike architecture, estimations and specific resources are secondary to a good project design. The topology of the network (which derives from the architecture) dictates the duration of the project, not the capabilities of the developers or, to a point, the variation in individual estimations. Estimations that differ significantly from reality could affect the project drastically. However, as long as the estimation is more or less correct, then it does not matter if the real duration involved is somewhat larger or smaller. With a decent-size project you will have dozens of activities whose individual estimations may be off in either direction. Overall, these offsets will tend to cancel each other. The same is true with developers’ capabilities. It makes a huge difference if you have the world’s worst or best developer, but as long as you have decent developers, things will even out. It is more important to be creative in coming up with project design ideas, to recognize constraints, and to work around pitfalls than it is to get every estimation exactly right.
你不应该教条地运用本书中的观点。
You should not apply the ideas in this book dogmatically.
您应该根据自己的具体情况调整项目设计工具,但不要影响最终结果。本书旨在向您展示什么是可能的,激发您的好奇心,鼓励您发挥创造力并发挥领导作用。
You should adapt the project design tools to your particular circumstances without compromising on the end result. This book aims to show you what is possible, to trigger your natural curiosity, to encourage you be creative, and to lead.
如果可能,不要秘密设计项目。设计成果和可见的设计流程可以与决策者建立信任。如果利益相关者询问,请告诉他们你在做什么以及你为什么要这样做。
When possible, do not design a project in secret. Design artifacts and a visible design process build trust with the decision makers. If stakeholders ask, educate them about what you are doing and why you are doing things this way.
以 Optionality 的方式与管理层沟通。
Communicate with management in Optionality.
当你与管理层打交道时,要说我称之为“可选性”的语言:简洁地描述管理层可以选择的选项,并能够客观地评估这些选项。这与项目设计中的一个核心概念非常吻合:没有“唯一”的项目。构建和交付任何系统总是有多种选择。每个选项都是时间、成本和风险的可行组合。因此,你应该设计几个这样的选项供管理层选择。
When you engage with management, speak the language I call Optionality: succinctly describing the options from which management can choose, and enabling objective evaluation of these options. This is very much aligned with a core concept in project design: There is no “the” project. There are always multiple options for building and delivering any system. Each option is some viable combination of time, cost, and risk. You should therefore design several such options from which management may chose.
良好管理的本质是选择正确的选项。此外,给予人们选择权可以赋予他们权力。毕竟,如果真的没有其他选择,那么经理也就没有必要了。缺乏选择的经理将被迫通过引入任意选项来证明其存在的合理性。如果没有项目设计作为后盾,这种人为的方案总是会取得糟糕的结果。为了避免这种危险,你必须向管理层提供一组可行的项目设计方案,这些方案由你预先选定。例如,第 11 章共调查了 15 个项目设计方案,但相应的 SDP 审查只有 4 个方案。
The essence of good management is choosing the right option. Moreover, giving people options empowers them. After all, if there is truly no other option, then there is also no need for the manager. Managers who lack options from which to choose will be forced to justify their existence by introducing arbitrary options. Without a backing project design, such contrived options always have poor results. To avoid this danger, you must present management with a set of viable project design options, preselected by you. For example, Chapter 11 investigated a total of 15 project design options, but the corresponding SDP review had only 4 options.
话虽如此,但不要过度使用可选性。提供太多选择会让人感到不安,这种困境被称为选择悖论。1这种悖论的根源在于害怕错过一些你没有选择的更好选择,即使你选择的选项足够好。
That said, do not overdo Optionality. Giving too many options upsets people, a predicament known as the paradox of choice.1 This paradox is rooted in the fear of missing out on some better option you did not choose, even if the option you did choose was good enough.
1. Barry Schwartz,《选择的悖论:为什么多即是少》(Ecco,2004 年)。
1. Barry Schwartz, The Paradox of Choice: Why More Is Less (Ecco, 2004).
以下是我针对要呈现多少个选项给出的指导原则:
Here are my guidelines on how many options to present:
两个选项太少了——几乎等于没有选项。
Two options is too few—too close to no options at all.
三个选项是理想的;大多数人可以轻松地在三个选项之间做出选择。
Three options is ideal; most people can easily choose between three options.
四个选项都可以,只要其中至少有一个(也许两个)是明显的错误。
Four options is fine as long as at least one of them (and maybe two) is an obvious mistake.
五个选项太多了,即使它们都是好的选项。
Five options is too many options, even if they are all good options.
压缩率不得超过 30%。
Do not exceed 30% compression.
无论你选择哪种方式来压缩项目,从合理的正常解决方案开始,30% 的工期缩短是你可能看到的最大压缩率。这种高度压缩的项目可能会面临较高的执行和工期风险。当你第一次开始使用项目设计工具并在团队中建立能力时,请避免使用压缩率超过 25% 的解决方案。
Whichever way you choose to compress the project, a 30% reduction in schedule is the maximum compression you will likely see when starting from a sound normal solution. Such highly compressed projects will probably suffer from high execution and schedule risk. When you first begin using the project design tools and building competency within your team, avoid solutions with more than 25% compression.
始终压缩项目,即使追求任何压缩解决方案的可能性很低。
Always compress the project, even if the likelihood of pursuing any of the compressed solutions is low.
压缩揭示了项目的本质和行为,通过更好地了解自己的项目,总会有所收获。压缩允许您对项目的时间成本曲线进行建模,当您需要评估计划变更的影响时,获得成本和风险的公式会很有帮助。能够快速果断地确定变更请求的可能后果是非常有价值的。另一种选择是直觉和冲突。
Compression reveals the true nature and behavior of the project, and there is always something to gain by better understanding your own project. Compression allows you to model the project’s time–cost curve, and obtaining formulas for cost and risk is helpful when you are required to assess the effect of schedule changes. It is immensely valuable to be able to quickly and decisively determine the likely consequence of a change request. The alternative is gut feel and conflict.
即使你怀疑某个请求不合理,说“不”——尤其是对一个有权威和权力的人说“不”对你的职业生涯毫无益处。说“不”的唯一方法是让“他们”说“不”。通过展示对进度、成本和风险的量化影响,你可以立即将之前只能凭直觉感知的事情浮出水面,从而实现不带感情的客观讨论。在没有数字和测量的情况下,任何事情都有可能发生。无视现实不是罪过,玩忽职守才是罪过。如果决策者知道一些数字与他们对客户的承诺相矛盾,并且仍然坚持这些承诺,那么他们就是在实施欺诈。因为这种责任是不可接受的,所以在有确切数字的情况下,他们会想方设法撤销承诺或更改之前“无法更改”的日期。
Even if you suspect that an incoming request is unreasonable, saying “no”—especially to a person of authority and power—is not conducive to your career. The only way to say “no” is to get “them” to say “no.” By showing the quantified effects on schedule, cost, and risk, you immediately bring to the surface what before you could only intuit, enabling an emotion-free, objective discussion. In the absence of numbers and measurements, anything goes. Ignorance of reality is not a sin, but malpractice is. If decision makers are aware of numbers that contradict their commitments to customers and still persist with those commitments, they are perpetrating fraud. Because such liability is unacceptable, in the presence of hard numbers, they will find ways of rescinding their commitments or changing previously “unchangeable” dates.
谨慎、明智地使用顶级资源进行压缩。
Compress with top resources carefully and judiciously.
当依赖顶级资源时,正确的项目设计对于知道在哪里应用它们至关重要。压缩顶级资源虽然很有吸引力,但可能会适得其反。首先,顶级人才通常很稀缺,因此您履行承诺所需的顶级资源可能无法获得。等待他们会导致延误并违背压缩的目的。即使有顶级资源可用,也可能使情况变得更糟,因为利用它们压缩关键路径可能会出现新的关键路径。由于您根据浮动时间和能力分配资源,因此您现在面临的风险是,最差的开发人员将在新的关键路径上工作。
When relying on top resources, proper project design is essential to know where to apply them. As appealing as it may be, compressing with top resources may backfire. To begin with, top talent is typically scarce, so the top resources you require to meet your commitments may not be available. Waiting for them creates delays and defeats the purpose of the compression. Even when available, top resources may make things worse because leveraging them to compress the critical path could make a new critical path emerge. Since you assign your resources based on float and capabilities, you now run the risk that the worst developers will be working on that new critical path.
即使被分配到以前的关键活动,顶级资源也经常处于闲置状态,等待项目中的其他活动和开发人员赶上进度。这降低了项目的效率。为了避免这种情况,您可能需要一个更大的团队,可以通过并行工作来压缩其他路径。团队规模的这种增加会降低效率并增加成本。最后,使用顶级资源进行压缩通常需要两个或更多这样的英雄来压缩多个关键或近乎关键的路径才能从压缩中看到任何好处。
Even when assigned to formerly critical activities, the top resources often are idle, waiting for other activities and developers in the project to catch up. This reduces the project’s efficiency. To avoid this situation, you may need a larger team that can compress other paths by working in parallel. Such an increase in team size will reduce efficiency and increase the cost. Finally, compressing using top resources often requires two or more such heroes to compress multiple critical or near-critical paths to see any benefit from the compression.
在分配顶级资源时,您应避免盲目分配(例如将顶级资源分配给所有当前关键活动)。评估哪条网络路径能从资源中获益最多,确定对其他路径的影响,甚至尝试跨链组合。根据关键路径的变化,您可能需要多次重新分配顶级资源。您还应该考虑活动规模以及关键性。例如,您可能有一个大型的非关键活动,具有高度的不确定性,很容易使项目脱轨。在那里分配顶级资源将降低这种风险,并最终帮助您履行承诺。
When assigning top resources, you should avoid doing so blindly (such as assigning the top resource to all current critical activities). Evaluate which network path would benefit the most from the resources, determine the effect on other paths, and even try combinations across chains. You may have to reassign the top resource several times based on the changes to the critical path. You should also look at activity size as well as the criticality. For example, you may have a large, noncritical activity with a high level of uncertainty that could easily derail the project. Assigning the top resources there will reduce that risk and ultimately help you meet your commitments.
压缩项目的最简单方法是修剪项目的初始活动,即模糊的前端。
The easiest way of compressing the project is to trim the project’s initial activities, the fuzzy front end.
虽然没有项目可以加速超越其关键路径,但这样的规则并不适用于前端。寻找在前端并行完成准备或评估任务的方法。这将压缩前端(以及项目),而不会对项目的其他部分进行任何更改。例如,图 14-1显示了一个前端较长的项目(上图)。前端包含一些关键的技术和设计选择,架构师必须先解决这些选择,然后项目的其他部分才能继续进行。通过聘请第二位架构师作为其中两个决策的承包商,前端持续时间减少了三分之一(图 14-1中的下图)。
While no project can be accelerated beyond its critical path, no such rule applies to the front end. Look for ways of working in parallel at the front end on preparatory or evaluation tasks. This would compress the front end (and thus the project) without any change to the rest of the project. For example, Figure 14-1 shows a project (the upper chart) with a long front end. The front end contains a few crucial technology and design choices that the architect had to settle before the rest of the project could proceed. By hiring a second architect as a contractor for two of these decisions, the front end duration was reduced by a third (the lower chart in Figure 14-1).
图 14-1与第二位架构师一起修剪前端
Figure 14-1 Trimming the front end with a second architect
利用浮动来预防不可预见的情况。
Preempt the unforeseen with float.
风险指数表明项目是否会在遇到第一个障碍时失败,或者项目是否可以利用该障碍进行改进,进行调整以使设计更接近现实。拥有足够的浮动资金(低风险表示)可以让您有机会在不可预见的情况下茁壮成长。
The risk index indicates whether the project will break down when it hits the first obstacle or whether the project can leverage that obstacle to introduce refinements, adapting to make the design a better approximation of reality. Having sufficient float (indicated by the low risk) gives you a chance to thrive in the face of the unforeseen.
我还发现,项目对浮动时间的需求既是心理上的,也是生理上的。生理上的需求很明显:你可以使用浮动时间处理变化和转移资源。心理上的需求是所有参与者的内心平静。在有足够浮动时间的项目中,人们很放松;他们可以集中精力并交付成果。
I also find that the project’s need for float is as much psychological as it is physical. The physical need is clear: You can consume float to handle changes and shift resources. The psychological need is the peace of mind of all involved. In projects with enough float, people are relaxed; they can focus and deliver.
第 10 章建议将 0.5 作为最低减压目标,将 0.3 作为最低风险水平。尽管这些风险指南很有价值,但在检查项目的风险曲线时,您应该意识到行为比价值观更重要。在对项目进行减压时,请寻找风险临界点,而不是 0.5 值。某些因素可能会使整个风险曲线偏高或偏低,但仍可能存在风险临界点。当正常解决方案的风险已经很低时,尤其如此。您可能需要对项目进行减压,但您可以使用临界点行为来实现这一点。
Chapter 10 suggested 0.5 as the minimum decompression target and 0.3 as the minimum risk level. As valuable as these risk guidelines are, when examining the risk curve of the project you should be aware that behavior is more important than values. When decompressing the project, look for the risk tipping point rather than the 0.5 value. Something may be skewing the whole risk curve higher or lower, but there could still be a tipping point for risk. This is especially the case when the normal solution already has low risk. You may need to decompress the project, but you do that by using the tipping-point behavior.
项目设计是一项注重细节的活动。您应该将项目设计行为视为需要规划和设计的另一项复杂工作。换句话说,您需要设计项目设计,甚至在设计时使用项目设计工具。您从系统设计开始这项设计工作,然后将项目设计作为一项连续的设计工作。
Project design is a detailed-oriented activity. You should treat the act of project design as just another intricate effort that you need to map out and design. In other words, you need to design project design, and even use the tools of project design when doing so. You begin this design effort with the system design and proceed to designing the project as a single continuous design effort.
为了帮助您入门,这里列出了一些常见的设计活动:
To help you get started, here is a list of common design activities:
收集核心用例
Gather core use cases
设计系统并生成调用链和组件列表
Design the system and produce call chains and a list of components
列出非编码活动
List noncoding activities
估计所有活动的持续时间和所需资源
Estimate the duration and required resources for all activities
使用宽带和/或工具估算整体项目
Estimate the overall project using broadband and/or a tool
设计正常解决方案
Design the normal solution
探索有限资源解决方案
Explore the limited-resources solution
寻找亚临界解决方案
Find the subcritical solution
使用顶级资源进行压缩
Compress using top resources
使用并行工作进行压缩
Compress using parallel work
使用活动变化进行压缩
Compress using activity changes
压缩至最短持续时间
Compress to minimum duration
执行吞吐量、效率和复杂性分析
Perform throughput, efficiency, and complexity analysis
制作时间成本曲线
Produce the time–cost curve
解压正常解决方案
Decompress the normal solution
重建时间成本曲线
Rebuild the time–cost curve
将时间成本曲线与整体项目估算进行比较
Compare the time–cost curve to the overall project estimation
量化和建模风险
Quantify and model risk
找到包容、排斥和风险区域
Find inclusion and exclusion and risk zones
确定可行的选择
Identify viable options
准备 SDP 审查
Prepare for SDP review
虽然其中一些活动可以与其他活动并行进行,但系统设计和项目设计中的活动确实具有相互依赖性。下一个合乎逻辑的步骤是使用简单的网络图来设计您的项目设计,甚至计算工作的总持续时间。图 14-2显示了项目设计网络图。您可以使用活动的典型持续时间来确定可能的关键路径。如果只有一位架构师在设计项目,那么图表实际上将是一个长字符串。如果架构师有人帮忙,或者架构师正在等待某些信息,则图表会建议并行执行的活动。
While some of these activities could take place in parallel to other activities, the activities in system design and project design do have interdependencies. The next logical step is to design your project design using a simple network diagram and even calculate the total duration of the effort. Figure 14-2 shows such a network diagram of the design of project design. You can identify the likely critical path using typical durations for the activities. If a single architect is designing the project, then the diagram will actually be a long string. If the architect has someone helping, or if the architect is waiting for some piece of information, the diagram suggests activities to do in parallel.
图14-2项目设计图
Figure 14-2 Design of project design
列表中的活动 6、7、8、9、10、11 和 12(图 14-2中以蓝色显示)是具体的项目设计解决方案。您可以进一步将每个活动细分为以下任务列表:
Activities 6, 7, 8, 9, 10, 11, and 12 in the list (shown in blue in Figure 14-2) are specific project design solutions. You can further break down each of those into this list of tasks:
发现规划假设
Discover planning assumptions
收集人员配置要求
Gather staffing requirements
审查并修改活动、估算和资源清单
Review and revise the list of activities, estimations, and resources
确定依赖关系
Decide on dependencies
修改网络以适应约束
Modify the network to accommodate constraints
修改网络以降低复杂性
Modify the network to reduce complexity
为活动分配资源并重新制定网络
Assign resources to activities and rework the network
绘制网络图
Draw the network diagram
评估浅 S 曲线
Evaluate the shallow S curve
评估人员分配图
Evaluate the staffing distribution chart
修改规划假设并重新设计网络
Modify the planning assumptions and rework the network
计算成本要素
Calculate cost elements
分析浮动
Analyze floats
计算风险
Calculate risk
在任何系统中,区分工作量和范围都很重要。软件系统中的架构必须在范围和时间上都包罗万象。它必须包括所有必需的组件,并且必须在当前和未来都正确无误(只要业务性质不变)。您必须避免因设计缺陷而导致的非常昂贵且不稳定的变更。在工作量方面,架构应该非常有限。本书第 1 部分解释了如何在几天到一个星期内提出可靠的、基于波动性的分解,即使对于大型系统也是如此。这样做需要知道如何正确地做事,但通过实践和经验肯定是可能的。
In any system it is important to distinguish between effort and scope. The architecture in a software system must be all-encompassing both in scope and in time. It must include all required components, and it must be correct at the present time and in the far future (as long as the nature of the business does not change). You must avoid the very expensive and destabilizing changes that are the result of a flawed design. When it comes to the effort, the architecture should be very limited. Part 1 of this book explained how you can come up with a solid, volatility-based decomposition in a few days to a week, even for a large system. Doing so requires knowing how to do things correctly, but it is certainly possible with practice and experience.
与架构相比,设计(尤其是服务详细设计或客户端用户界面)更耗时,而且范围有限。仅完善几个交互服务的详细设计就可能需要数周时间。
Compared to the architecture, design—especially the services detailed design or Clients user interface—is both more time-consuming and limited in scope. It may take several weeks to refine the detailed design of just a few interacting services.
最后,编码是最耗时的,而且范围也最有限。开发人员不应该一次编写多个服务,而且他们还需要花费大量时间来测试和集成每个服务。
Finally, coding is the most time-consuming and the most limited in scope. Developers should never code more than one service at a time, and they will spend considerable time testing and integrating each service as well.
图 14-3定性地说明了软件项目的范围与工作量。您可以看到,范围和工作量实际上是互为倒数的。当某件事的范围较广时,其工作量就较窄,反之亦然。
Figure 14-3 illustrates in a qualitative manner the scope versus effort for a software project. You can see that scope and effort are literally inverses of each other. When something is wider in scope, it is narrow in effort, and vice versa.
图 14-3软件系统中的范围与工作量
Figure 14-3 Scope versus effort in a software system
第 3 章讨论了将子系统映射到架构的垂直部分的概念。在大型项目中,您可能有多个这样的子系统。这些子系统应该完全分离且彼此独立。每个子系统都有自己的活动集合,例如详细设计和构造。在顺序项目中,子系统是连续的,如图14-4所示。
Chapter 3 discusses the concept of mapping subsystems to vertical slices of the architecture. In a large project you may have several such subsystems. These subsystems should be fairly decoupled and independent from one another. Each subsystem has its own collection of activities, such as detailed design and construction. In a sequential project, the subsystems are consecutive, as shown in Figure 14-4.
图 14-4顺序项目生命周期
Figure 14-4 Sequential project life cycle
请注意,子系统始终是在现有架构的背景下设计和构建的。图 14-4中的工作量分配仍然是图 14-3中的工作量分配。
Note that the subsystems are always designed and constructed in the context of the existing architecture. The effort allocation in Figure 14-4 is still that of Figure 14-3.
您可能能够压缩项目并开始并行工作。图 14-5显示了与时间线对齐的并发子系统开发的两个视图。
You may be able to compress the project and start working in parallel. Figure 14-5 shows two views of concurrent subsystem development aligned against the timeline.
图 14-5并行项目生命周期
Figure 14-5 Parallel project life cycles
选择哪种并行生命周期取决于架构子系统之间的依赖关系级别。在图 14-5中,右侧的生命周期将子系统交错排列,使其在时间线上重叠。在这种情况下,一旦子系统所依赖的子系统的接口实现完成,您就可以开始构建子系统。然后,您可以与上一个子系统并行处理子系统的其余部分。您甚至可以创建完全并行的管道,就像图 14-5左侧的布局一样。在这种情况下,您可以独立地、同时地构建每个子系统,同时将集成度降至最低。
Which parallel life cycle you choose depends on the level of dependencies between the subsystems of the architecture. In Figure 14-5, the life cycle on the right staggers the subsystems to overlap on the timeline. In this case, you can start building a subsystem once the implementation of the interfaces of the subsystem on which it depends are complete. You can then work on the rest of the subsystem in parallel to the previous one. You can even create fully parallel pipelines like the layout on the left of Figure 14-5. In this case, you build each subsystem independently of and concurrently with the other subsystems with minimum integration.
团队的组成和构成对项目设计有重大影响。这里的团队组成特指高级开发人员与初级开发人员的比例。大多数组织(甚至个人)根据工作年限来定义资历。我使用的定义是,高级开发人员能够设计服务的细节,而初级开发人员则不能。详细设计是在将系统主要架构分解为服务之后进行的。对于每项服务,详细设计包含服务公共接口或契约的设计、其消息和数据契约以及类层次结构或安全性等内部细节。
The composition and makeup of the team has a significant effect on project design. Here, team composition refers specifically to the ratio of senior to junior developers. Most organizations (and even individuals) define seniority based on years of experience. The definition I use is that senior developers are those capable of designing the details of the services, whereas junior developers cannot. Detailed design takes place after the major architectural decomposition of the system into services. For each service, the detailed design contains the design of the service public interfaces or contracts, its messages and data contracts, and internal details such as class hierarchies or security.
请注意,高级开发人员的定义不是能够或知道如何进行详细设计的开发人员。相反,高级开发人员是那些能够进行详细设计的开发人员,只要你向他们展示如何正确地进行详细设计。
Note the definition of senior developers is not developers who can or know how to do detailed design. Instead, senior developers are those capable of doing detailed design, once you show them how to do so correctly.
当你只有初级开发人员时,架构师必须提供服务的详细设计。这定义了架构师和开发人员之间的初级交接。初级交接会不成比例地增加架构师的工作量。例如,在一个为期 12 个月的项目中,整个持续时间中大约有 3 到 4 个月的时间可能仅用于详细设计。
When all you have are junior developers, the architect must provide the detailed design of the services. This defines the junior hand-off between the architect and the developers. The junior hand-off disproportionally increases the architect’s workload. For example, in a 12-month project, some 3 to 4 months of the overall duration could be spent simply on detailed design.
架构师的详细设计工作可以在前端进行,也可以在开发人员构建某些服务时进行。这两种选择都不好。
The architect’s detailed design work can take place in the front end or while developers are constructing some of the services. Both of these options are bad.
预先提出所有服务的正确细节非常困难,而提前了解所有服务的所有细节如何结合在一起则设置了一个非常高的标准。可以预先设计一些服务,但不是全部。真正的问题是前端的详细设计实在是太耗时了。管理层不太可能理解详细设计的重要性,并且会对扩展前端以适应它的前景感到畏缩。因此,管理层将强迫将架构移交给初级开发人员,从而使项目失败。
Coming up with the correct details of all the services up front is very demanding, and seeing in advance how all the details across all services mesh together sets a very high bar. It is possible to design a few services up front, but not all of them. The real problem is that detailed design in the front end simply takes too long. Management is unlikely to understand the importance of detailed design and will cringe at the prospect of extending the front end to accommodate it. Consequently, management will force handing off the architecture to junior developers and doom the project.
在开发人员构建架构师已设计的服务的同时,实时设计服务是可行的。但是,让架构师承担过多的详细设计工作会使架构师成为瓶颈,并可能大大减慢项目进度。
Designing the services on the fly, in parallel to the developers who are constructing services that the architect has already designed, could work. However, overloading the architect with detailed design makes the architect a bottleneck and may considerably slow down the project.
高级开发人员对于解决详细设计挑战至关重要。如果还不具备这样的能力,经过适度的培训和指导,高级开发人员可以完成详细设计工作,从而实现架构师和开发人员之间的高级交接。
Senior developers are essential to address the detailed design challenge. If not already capable of doing so, with modest training and mentoring senior developers can perform the detailed design work, allowing for a senior hand-off between the architect and the developers.
通过高级交接,架构师可以在 SDP 评审后很快交接设计,仅使用接口的总体术语提供服务的总体概述或仅建议设计模式。详细设计现在作为每个单独服务的一部分进行,架构师只需对其进行审查并根据需要进行修改。事实上,支付额外高级开发人员的唯一原因就是实现高级交接。高级交接是加速任何项目的最安全方式,因为它可以压缩时间表,同时避免更改关键路径、增加执行风险或引入瓶颈。由于较短的项目成本较低,因此高级开发人员的成本实际上低于初级开发人员。
With a senior hand-off, the architect can hand off the design soon after the SDP review, providing only a general outline of the services using gross terms for interfaces or just suggesting a design pattern. The detailed design now takes place as part of each individual service, and the architect just needs to review it and amend as needed. In fact, the only reason to pay for additional senior developers is to enable the senior hand-off. The senior hand-off is the safest way of accelerating any project because it compresses the schedule while avoiding changes to the critical path, increasing the execution risk, or introducing bottlenecks. Since shorter projects will cost less, it follows that senior developers actually cost less than junior developers.
高级交接的问题在于高级开发人员数量稀少。您可能有一两个,也许有三个,但不是整个团队。如果情况如此,您不应该使用一两个高级开发人员作为开发人员。相反,应该改变流程,让这些高级开发人员主要负责详细的设计工作。图 14-6显示了该流程流的样子。
The problem with the senior hand-off is the scant availability of senior developers. You may have one or two of them, and perhaps three, but not an entire team. If that is your situation, you should not use your one or two senior developers as developers. Instead, change the process to have these senior developers do mostly detailed design work. Figure 14-6 shows what that process flow looks like.
图 14-6与初级架构师并行工作
Figure 14-6 Working in parallel with junior architects
架构师必须提供全面的架构,如本书第 1 部分所述。架构在系统生命周期内不会改变,并且构建始终在该架构的上下文中完成。架构的生成仍然发生在项目的前端。前端可能还包含第一批服务的详细设计。此详细设计既由高级开发人员完成,又在架构师的指导下用作培训和学习机会。这实际上将高级开发人员变成了初级架构师。
The architect must provide a comprehensive architecture, as discussed in Part 1 of this book. The architecture will not change during the life of the system, and the construction is always done in the context of that architecture. Producing the architecture still takes place at the project’s front end. The front end may also contain the detailed design of the first handful of services. This detailed design is both done by the senior developers and used as a training and learning opportunity under the guidance of the architect. This, in effect, turns the senior developers into junior architects.
一旦服务的详细设计完成,初级开发人员就可以介入并构建实际服务。但是,任何设计改进,无论多么微不足道,都需要初级开发人员咨询设计该服务的高级开发人员。完成每项服务构建后,初级开发人员将与高级开发人员(而不是他们的初级同事)一起进行代码审查,然后与其他初级开发人员进行集成和测试。与此同时,高级开发人员仍然忙于下一个服务的详细设计一批服务。每个设计在交给初级开发人员之前都要与架构师一起审查。
Once the detailed design of the services is complete, the junior developers can step in and construct the actual services. However, any design refinement, as trivial as it may be, requires the junior developers to consult with the senior developer who designed that service. Once finished with each service construction, the junior developers proceed to code review with the senior developers (not their junior peers), followed by integration and testing with other junior developers. Meanwhile, the senior developers remain busy with the detailed design of the next batch of services. Each design is reviewed with the architect before hand-off to the junior developers.
这种方式是减轻初级交接风险的最佳且唯一方法。显然,它还需要细致的项目设计。您必须确切地知道您可以提前设计多少服务以及如何将交接与构建同步。您还必须添加明确的服务详细设计活动,甚至额外的集成点,以解决从服务中提取详细设计的风险。
Working this way is the best and only way of mitigating the risks of the junior hand-off. Clearly, it also requires meticulous project design. You must know exactly how many services you can design in advance and how to synchronize the hand-offs with the construction. You must also add explicit service detailed design activities and even additional integration points to address the risk of extracting the detailed design out of the services.
与系统设计一样,项目设计也需要实践。专业人士(从律师到医生再到飞行员)的基本期望是,他们必须熟记自己的行业,并坚持不懈。面对压力,每个人都会退缩到自己的训练水平。不幸的是,与系统设计不同,几乎没有软件架构师了解项目设计或接受过相关培训,尽管项目设计对于成功至关重要,而且正如第 7 章所述,也是软件架构师的职责。
As with system design, when it comes to project design, you must practice. The basic expectation of professionals—from lawyers to doctors to pilots—is that they know their trade by heart and that they keep at it. Under fire, everybody sinks to their level of training. Unfortunately, unlike system design, hardly any software architect is even aware of project design or is trained in it, even though project design is both critical to success and, as discussed in Chapter 7, the software architect’s responsibility.
除了项目设计实践的需求之外,还有两个问题。首先,项目设计是一个庞大的话题。本书涵盖了现代软件架构师所需的核心知识,包括系统设计和项目设计。就页数而言,项目设计比系统设计多 2 倍。现在你应该有一种感觉,你正在窥视一个很深的兔子洞。没有训练和实践,你就无法内化和正确使用本书的概念。通过在工作中设计真正的项目来弄清楚项目设计不仅是自找麻烦,而且违背了常识。你想成为刚从医学院毕业的医生的第一个病人吗?你想和新飞行员一起飞行吗?你为你的第一个程序感到骄傲吗?
Compounding the need for project design practice are two additional issues. First, project design is a vast topic. This book covers the core body of knowledge required of modern software architects, both system design and project design. In terms of its page count, project design outweighs system design by 2:1. You should now have a feeling that you are peering into a deep rabbit hole. You cannot internalize and correctly use the concepts of this book without training and practice. Figuring out project design by designing real projects on the job not only is asking for trouble, but also defies common sense. Would you like to be the first patient of a doctor fresh out of medical school? Would you like to fly with a new pilot? Are you proud of your first program?
其次,项目设计在很多情况下会产生非直观的结果。你不仅需要练习才能掌握大量的知识,还需要培养新的直觉。好消息是,项目设计技能是可以习得的,项目设计质量的快速显著提高和实践者的成功率就是明证。
Second, project design, in many cases, produces non-intuitive results. You will have to practice not just to master a massive body of knowledge, but also to develop a new intuition. The good news is that project design skills can be acquired, as is evident by the swift and marked improvement in the quality of the project designs and the success rate of those who do practice.
第 2 章强调了系统设计实践的重要性。始终将系统设计实践与构建系统的项目设计实践相结合。从一个简单的常规解决方案开始。训练直到你熟悉为止为您的实践系统制定常规解决方案。然后,从那里开始构建,以找到在进度、成本和风险方面最好的解决方案。
Chapter 2 emphasized the importance of system design practice. Always combine practice in designing a system and practice in designing the project to build it. Start with just a simple normal solution. Train until you are comfortable with normal solutions for your practice systems. Then, build from there to find the best solution as far as schedule, cost, and risk.
检查自己过去的项目。借助事后诸葛亮的优势,尝试重构已实施的项目设计,并将其与应该完成的项目进行对比。确定规划假设、典型错误和正确决策。列出所有如果可以的话会提出的解决方案,为 SDP 审查做好准备。看看你当前的项目。你能列出活动、根据团队目前正在做的事情做出正确的估计并计算出真实的时间表和成本吗?当前的风险水平是多少?需要做什么来减轻项目压力?什么级别的压力是可行的?
Examine your own past projects. With the advantage of hindsight, try to reconstruct the project design that took place and contrast it with what should have been done. Identify the planning assumptions, the classic mistakes, and the right decisions. Prepare for an SDP review by listing all the solutions you would have proposed if you could. Look at your current project. Can you list the activities, come up with the correct estimations based on what the team is presently doing, and calculate the true schedule and cost? What is the current risk level? What is required to decompress the project? What level of compression is feasible?
当你认为自己做对了,就再提高标准,找到改进这些设计的方法。永远不要满足于现状。开发新技术,完善自己的风格,成为项目设计的热情专家和倡导者。
When you think you have got it right, raise the bar again and find ways of improving these designs. Never rest on your laurels. Develop new techniques, refine your own style, and become a passionate expert and advocate of project design.
尽管汇报是一种有效的技术,具有极好的投资回报率,但在软件行业中,它却未被充分利用。汇报项目设计工作和结果非常重要。它提供了一种在项目和角色之间分享经验教训的方法,以便每个人都可以从他人的经验中学习。它所需要的只是自我反思、分析和改进的愿望。您应该汇报每一个项目,并将汇报作为软件开发生命周期的一部分。您应该汇报整个项目,并汇报每个子系统或里程碑。您越是将汇报作为日常工作的一部分,您就越有可能真正汇报并从中受益。
Debriefing is underutilized in the software industry, even though it is an effective technique with fantastic ROI. A debrief of your project design effort and results is important. It provides a way to share lessons learned across projects and roles so that each person can learn from the experience of others. All it takes is self-reflection, analysis, and the desire to improve. You should debrief each and every one of your projects and make the debriefing part of your software development life cycle. You should debrief each project as a whole and debrief each subsystem or milestone as well. The more you make debriefing part of your routine, the more likely you are to actually debrief and benefit from it.
汇报主题取决于您认为哪些是重要的以及哪些需要改进。它们可能包括以下考虑事项:
The debrief topics depend on what you deem important and what needs improvement. They may include the following considerations:
估算和准确性。对于每项活动,问问自己,与实际持续时间相比,初始估算的准确性如何,以及您需要调整估算的次数和方向。是否有明显的模式可以纳入未来的项目中以改进估算?查看初始活动列表,看看您遗漏了什么以及哪些是多余的。计算估算中的错误相互抵消的程度。
Estimations and accuracy. For each activity, ask yourself how accurate the initial estimation was when compared with the actual duration, and how many times you had to adjust the estimations and in which direction. Is there a noticeable pattern that you could incorporate in future projects to improve the estimations? Review the initial list of activities to see what you missed and what was superfluous. Calculate the extent to which the errors in the estimations canceled each other out.
设计功效和准确性。将初始总体项目估算与详细项目设计以及实际工期和成本的准确性进行比较。您对团队生产率的评估有多准确?风险减压是否必要?如果需要,是太多还是太少?最后,压缩项目是否可行,项目经理和团队如何处理复杂性?
Design efficacy and accuracy. Compare the accuracy of the initial broad project estimation with the detailed project design and the actual duration and cost. How accurate was your assessment of the throughput of the team? Was risk decompression necessary, and if so, was it too much or too little? Finally, was the compressed project doable, and how did the project manager and the team handle complexity?
个人和团队合作。团队成员作为团队或个人的工作表现如何?是否有害群之马?未来能否通过使用更好的工具或技术提高团队的工作效率?团队是否及时沟通问题?团队成员对计划及其在其中的角色的理解程度如何?
Individual and team work. How well did the team members work as a team or individually? Were there any bad apples? Can you make the team more productive in the future by using better tools or technology? Did the team communicate issues in a timely manner? How well did the team members understand the plan and their role in it?
下次要避免或改进什么。编制一份按优先级排列的清单,列出人员、流程、设计和技术方面遇到的所有错误或麻烦。对于每一项,确定如何尽早发现或从一开始就避免它。列出导致问题的行为和应该采取的行动。您还应该包括最终没有造成伤害的险情。
What to avoid or improve next time. Compile a prioritized list of all the mistakes or troubles encountered across people, process, design, and technology. For each item, identify how you could have detected it sooner or avoided it in the first place. List both actions that caused problems and actions that should have taken place. You should also include near-misses that did not end up causing harm.
以前汇报中重复出现的问题。改进的最佳方法之一是避免过去的错误并防止已知问题的发生。当同一个错误在一个又一个的项目中出现时,这对每个人都是有害的。同一个问题重复出现很可能有很好的理由。尽管如此,尽管面临挑战,您仍然必须消除重复出现的错误。
Recurring issues from previous debriefs. One of the best ways to improve is to avoid past mistakes and prevent known problems from happening. It is detrimental to everyone when the same mistakes appear in project after project. There is likely a very good reason why the same problem is recurrent. Nonetheless, you must eliminate recurring mistakes in spite of the challenges.
对质量的承诺。对质量的承诺缺失了多少?对质量的承诺与成功有多密切相关?
Commitment to quality. What level of commitment to quality was missing or present? How intimately related was it to success?
即使项目成功,只要项目履行了承诺,汇报也很重要。你必须知道你的成功是因为你很幸运,还是因为你有一个可行的系统和项目设计。即使项目成功了,你还能做得更好吗?你应该做些什么来维持你做对的事情?
It is important to debrief even successful projects that have met their commitments. You must know if you have succeeded just because you were lucky or because you had a viable system and project design. Even when the project is a success, could you have done a better job? What should you do to sustain the things you did right?
从理论上讲,本书的所有内容都与质量有关。拥有完善的架构的目的就是最终获得尽可能简单的系统。这样可以实现更高质量的系统,并且更易于测试和维护。不可否认的是:质量决定生产力,当产品充满缺陷时,您不可能满足您的进度和预算承诺。当团队花更少的时间寻找问题时,团队会花更多的时间增加价值。精心设计的系统和项目是满足最后期限的唯一方法。
In the abstract, everything in this book is about quality. The very purpose of having a sound architecture is to end up with the least complex system possible. This provides for a higher-quality system that will be easier to test and maintain. There is no denying it: Quality leads to productivity, and it is impossible to meet your schedule and budget commitments when the product is rife with defects. When the team spends less time hunting problems, the team spends more time adding value. Well-designed systems and projects are the only way to meet a deadline.
对于任何软件系统来说,质量取决于项目设计,而项目设计应将关键的质量控制活动作为项目不可分割的一部分。您的项目设计必须在时间和资源方面考虑质量控制活动。如果您的项目设计目标是快速、干净地构建系统,请不要偷工减料。
With any software system, quality hinges on having a project design that includes the crucial quality-control activities as an integral part of the project. Your project design must account for the quality control activities both in time and in resources. Do not cut corners if your project design goal is to build the system quickly and cleanly.
项目设计的一个副作用是,设计良好的项目压力较小。当项目拥有所需的时间和资源时,人们对自己的能力和项目领导能力充满信心。他们知道计划是可行的,每项活动都已安排到位。当人们压力较小时,他们会注意细节,事情不会出现纰漏,从而提高质量。此外,设计良好的项目可以最大限度地提高团队的效率。这有助于提高质量,因为团队可以更轻松地以最低成本的方式识别、隔离和修复缺陷。
A side effect of project design is that well-designed projects are low-stress projects. When the project has the time and the resources it requires, people are confident in their own ability and in their project’s leadership. They know the schedule is doable and that every activity is accounted for. When people are less stressed, they pay attention to details, and things do not fall between the cracks, resulting in better quality. In addition, well-designed projects maximize the team’s efficiency. This contributes to quality by allowing the team to more readily identify, isolate, and fix defects in the least costly way.
您的系统和项目设计工作应激励团队编写出尽可能高质量的代码。您会发现成功是令人上瘾的:一旦人们开始正确地工作,他们就会为自己的工作感到自豪,并且永远不会回头。没有人喜欢充满低质量、紧张和指责的高压环境。
Your system and project design effort should motivate the team to produce the highest-quality code possible. You will see that success is addictive: Once people are exposed to working correctly, they take pride in what they do and will never go back. No one likes high-stress environments afflicted by low quality, tension, and accusations.
您的项目设计应始终考虑质量控制要素或活动。这些包括以下内容:
Your project design should always account for quality-control elements or activities. These include the following:
服务级别测试。在估算每项服务的持续时间和工作量时,请确保估算中包括编写服务测试计划、根据计划运行单元测试以及执行集成测试所需的时间。如果相关,请将集成测试的时间添加到回归测试中。
Service-level testing. When estimating the duration and effort of each service, make certain the estimation includes the time needed to write the test plan for the service, to run the unit test against the plan, and to perform integration testing. If relevant, add the time to roll the integration testing into your regression testing.
系统测试计划。项目必须有明确的活动,由合格的测试工程师编写测试计划。其中包括破坏系统并证明其无法正常工作的所有方法的列表。
System test plan. The project must have an explicit activity in which qualified test engineers write the test plan. This includes a list of all the ways to break the system and prove it does not work.
系统测试工具。项目必须有明确的活动,让合格的测试工程师开发全面的测试工具。
System test harness. The project must have an explicit activity in which qualified test engineers develop a comprehensive test harness.
系统测试。项目必须有一个明确的活动,其中软件质量控制测试人员在使用测试工具时执行测试计划。
System testing. The project must have an explicit activity in which the software quality-control testers execute the test plan while using the test harness.
每日烟雾测试。作为项目间接成本的一部分,您必须每天对不断发展的系统进行干净的构建,启动它,并(形象地说)将水冲入管道。这种烟雾测试将发现系统管道中的问题,例如托管、实例化、序列化、连接、超时、安全性和同步方面的缺陷。通过将结果与前一天的烟雾测试进行比较,您可以快速隔离管道问题。
Daily smoke tests. As part of the indirect cost of the project, on a daily basis, you must do a clean build of the evolving system, power it up, and (figuratively) flush water down the pipes. This kind of smoke test will uncover issues in the plumbing of the system, such as defects in hosting, instantiation, serialization, connectivity, timeouts, security, and synchronization. By comparing the result with the previous day’s smoke test, you can quickly isolate plumbing issues.
间接成本。质量不是免费的,但它往往能收回成本,因为缺陷的成本非常高昂。确保正确核算质量所需的投资,尤其是当它以间接成本的形式出现时。
Indirect cost. Quality is not free, but it does tend to pay for itself because defects are horrendously expensive. Make sure to account correctly for the required investments in quality, especially when it is in the form of indirect cost.
测试自动化脚本。自动化测试应该是项目中的一项明确活动。
Test automation scripting. Automating the tests should be an explicit activity in the project.
回归测试设计和实施。项目必须进行全面的回归测试,以便在系统、子系统、服务和所有可能的交互中发生不稳定变化时立即检测出来。这将防止修复现有缺陷或简单进行更改而引入新缺陷的连锁反应。虽然持续执行回归测试通常被视为间接成本,但项目必须包含编写回归测试及其自动化的活动。
Regression testing design and implementation. The project must have comprehensive regression testing that detects destabilizing changes the moment they happen across the system, subsystems, services, and all possible interactions. This will prevent a ripple effect of new defects introduced by fixing existing defects or simply making changes. While executing regression testing on an ongoing basis is often treated as an indirect cost, the project must contain activities for writing the regression testing and its automation.
系统级评审。 第 9 章讨论了在服务级别进行广泛的同行评审的必要性。由于缺陷可能发生在任何地方,因此您应该将评审扩展到系统级别。核心团队和开发人员必须评审系统需求规范、架构、系统测试计划、系统测试工具代码以及任何其他系统级代码工件。无论是服务评审还是系统评审,最有效和最高效的评审都是按照性质2进行的结构化,并指定角色(主持人、所有者、记录员、审阅者),以及后续行动,以确保建议在整个系统中得到应用。团队至少应该举行非正式评审,与一个或多个同行一起审查这些工件。无论使用哪种方法,这些评审都需要高度的相互参与以及团队对质量的承诺精神。事实上,交付高质量的软件是一项团队运动。
System-level reviews. Chapter 9 discussed the need to engage in extensive peer reviews at the service level. Since defects can occur anywhere, you should extend reviews to the system level. The core team and the developers must review the system requirements spec, the architecture, the system test plan, the system test harness code, and any additional system-level code artifacts. Both with service and system reviews, the most effective and efficient reviews are structured in nature2 and have designated roles (moderator, owner, scribe, reviewers), as well as follow-ups to ensure the recommendations are applied across the system. At a minimum, the team should hold informal reviews that involve walking through these artifacts with one or more peers. Regardless of the method used, these reviews require a high degree of mutual involvement along with team spirit of commitment for quality. The reality is that delivering high-quality software is a team sport.
2. https://en.wikipedia.org/wiki/Software_inspection
2. https://en.wikipedia.org/wiki/Software_inspection
此列表只是部分内容。此处的目的并非向您提供所有必需的质量控制活动,而是让您思考在项目中必须执行的所有质量控制活动。
This list is only partial. The objective here is not to provide you with all the required quality-control activities, but rather to get you to think about all the things you must do in your project to control quality.
您的项目设计应始终考虑质量保证活动。前面的章节(尤其是第 9 章)已经讨论了质量保证,但您应该将以下质量保证活动添加到您的流程和项目设计中:
Your project design should always account for quality-assurance activities. Previous chapters (especially Chapter 9) have already discussed quality assurance, but you should add the following quality assurance activities to your process and your project design:
培训。如果你的开发人员不试图自己摸索新技术,那么成本就会大大降低(而且质量也会更好)。派遣开发人员去培训(或进行内部培训),您可以立即消除由于学习曲线或缺乏经验而导致的许多缺陷。
Training. It costs significantly less (and is much better quality-wise) if your developers do not attempt to figure out new technologies on their own. By sending the developers to training (or bringing the training in-house), you instantly eliminate many defects due to learning curves or lack of experience.
编写关键的 SOP。软件开发非常复杂且具有挑战性,因此不能留下任何意外。如果您没有针对所有关键活动制定标准操作程序 (SOP),请花时间研究和编写它们。
Authoring key SOPs. Software development is so complex and challenging that nothing should be left to chance. If you do not have standard operating procedures (SOPs) for all key activities, devote the time to researching and writing them.
采用标准。与 SOP 类似,您必须有一个设计标准(参见附录 C)和编码标准。通过遵循最佳实践,您将可以避免问题和缺陷。
Adopting standards. Similar to SOPs, you must have a design standard (see Appendix C) and a coding standard. By following best practices, you will prevent problems and defects.
参与 QA。积极聘请真正的质量保证人员。让该人员审查开发流程,对其进行调整以确保质量,并创建一个既有效又易于遵循的流程。该流程应支持调查和消除缺陷的根本原因,甚至最好是主动预防问题的发生。
Engaging QA. Actively engage a true quality-assurance person. Have that person review the development process, tune it to assure quality, and create a process that is both effective and easy to follow. This process should support investigation and elimination of the root cause of defects or, even better, should proactively prevent problems from happening in the first place.
收集和分析关键指标。指标可让您在问题发生之前发现问题。它们包括与开发相关的指标,例如估算准确性、效率、评审中发现的缺陷、质量和复杂性趋势,以及运行时指标,例如正常运行时间和可靠性。如果需要,设计活动来构建收集指标的工具,并考虑定期收集和分析指标的间接成本。使用 SOP 进行支持,要求对异常指标采取行动。
Collecting and analyzing key metrics. Metrics allow you to detect problems before they happen. They include development-related metrics such as estimation accuracy, efficiency, defects found in reviews, quality and complexity trends, as well as run-time metrics such as uptime and reliability. If required, devise the activities to build the tools that collect the metrics, and account for the indirect cost of collecting and analyzing them on a regular basis. Back it up with a SOP that mandates acting on abnormal metrics.
汇报。如上一节所述,在工作进展过程中汇报工作进展,并在项目完成后汇报整个项目进展情况。
Debriefing. As described in the previous section, debrief your work as you progress, and debrief the project as a whole once it is completed.
大多数经理不信任他们的团队。这些经理经历了太多失望,他们认为团队付出的努力与期望结果之间几乎没有关联。因此,经理们对一切都采取了微观管理。这是长期缺乏信任的直接结果。开发人员对微观管理的反应是沮丧和冷漠,失去了任何一丝责任感。这进一步降低了信任度,证实了经理们的观点是正确的。
Most managers do not trust their teams. Those managers have experienced too many disappointments, and they see little or no correlation between the effort the team expends and the desired results. Consequently, the managers resort to micromanaging everything. This is a direct result of a chronic deficit in trust. The developers respond to the micromanagement with frustration and apathy, and lose any remaining shred of accountability. This degrades trust even further, vindicating the sentiments of the managers.
扭转这种局面的最佳方式是让团队对质量产生不懈的执着。当团队完全致力于质量时,他们会从质量的角度推动每一项活动,修复破碎的文化,营造出一种工程卓越的氛围。要达到这种状态,你必须提供正确的背景和环境。在实践中,这意味着要做这本书中提到的每一件事——甚至更多。
The best way of turning this dynamic around is by infecting the team with a relentless obsession for quality. When totally committed to quality, the team will drive every activity from a perspective of quality, fixing the broken culture and creating an atmosphere of engineering excellence. To reach this state, you must provide the right context and environment. In practice, this means doing everything in this book—and more.
结果将是从微观管理过渡到质量保证。允许和信任人们控制其工作质量是授权的本质。一旦做到这一点,您就会发现质量是终极项目管理技术,只需很少的管理,同时就能最大限度地提高团队的生产力。经理现在专注于为团队提供正确的环境,信任团队能够按时、按预算生产出无可挑剔的软件系统。
The result will be a transition from micromanagement to quality assurance. Allowing and trusting people to control the quality of their work is the essence of empowerment. Once this is in place, you will learn that quality is the ultimate project management technique, requiring very little management while maximizing the team’s productivity. The managers now focus on facilitating the correct environment for the team, trusting the team to produce impeccable software systems, on time and on budget.
历史上最容易被误解的名言之一出自陆军元帅赫尔穆特·冯·毛奇之口:“任何作战计划都无法在与敌人接触后幸存下来。”从那时起,这句话就被断章取义,成为不做任何规划的理由——这与它的初衷完全相反。冯·毛奇被称为 1870 年普法战争的缔造者,是一位军事规划天才,取得了一系列令人惊叹的军事胜利。冯·毛奇意识到,面对瞬息万变的情况,成功的关键是不要依赖单一的静态计划。相反,你必须能够灵活地在几个精心设计的选项之间快速切换。初始计划的目的仅仅是通过尽可能地将可用资源与目标相结合来提供战斗机会。从那时起,人们必须不断跟踪计划并根据需要对其进行修改,通常是提出当前计划的变体、切换到替代的预先计划的选项或完全设计新的选项。
One of the most misunderstood quotes in history is attributed to Field Marshal Helmuth von Moltke, the Elder: “No battle plan survives contact with the enemy.” Ever since, this statement has been taken out of context as a justification for no planning at all—the complete opposite of its original intent. Von Moltke, known as the architect of the Franco-Prussian War of 1870, was a military planning genius credited with a series of stunning military victories. Von Moltke realized that the key to success in the face of rapidly changing circumstances is to not rely on a single static plan. Instead, you must have the flexibility to pivot quickly between several meticulously laid-out options. The purpose of the initial plan is merely to provide a fighting chance by aligning the available resources with the objective as best as possible. From that point onward, one must constantly track against the plan and revise it as needed, often by coming up with variations of the current plan, switching to an alternative preplanned option, or devising new options altogether.
在系统和项目设计的背景下,冯·毛奇的见解在今天和 150 年前一样具有现实意义。本书中的项目设计技术支持两个目标。第一个目标是在 SDP 审查期间推动做出明智的决策,以确保决策者选择可行的方案。这样的方案可以作为实施的良好起点,为项目提供一线生机。项目设计的第二个目标是在实施过程中调整计划。项目经理必须不断将实际发生的情况与计划联系起来,而架构师需要使用项目设计工具重新设计项目以响应现实。这通常以适度的项目重新设计迭代的形式出现。您要避免任何大范围的修正,而是使用大量小修正来顺利推动项目。否则,所需的修正程度可能会令人痛苦不堪,并导致项目失败。
In the context of system and project design, von Moltke’s insight is as relevant today as it was 150 years ago. The project design techniques in this book support two objectives. The first objective is to drive an educated decision during the SDP review, to ensure the decision makers choose a viable option. Such an option serves as a good-enough starting point coming into execution, allowing for a fighting chance. The second objective for project design is to adapt the plan during execution. The project manager must constantly correlate what is actually going on with the plan, and the architect needs to use the project design tools to redesign the project to respond to reality. This often takes the form of modest project redesign iterations. You want to avoid any gross corrections, and instead drive the project smoothly using numerous small corrections. Otherwise, the degree of correction required may be wrenching and cause the project to fail.
一份好的项目计划并不是一份签了字就放在抽屉里,再也不会被拿出来使用的东西。一份好的项目计划是一份实时文档,你可以不断修改它以履行承诺。这需要你了解自己在计划方面处于什么位置、前进的方向以及应对不断变化的情况需要采取哪些纠正措施。这就是项目跟踪的意义所在。
A good project plan is not something you sign off and file in a drawer, never again to see the light of day. A good project plan is a live document that you constantly revise to meet your commitments. This requires knowing where you are with respect to the plan, where you are heading, and what corrective actions to take in response to changing circumstances. This is what project tracking is all about.
项目跟踪是项目管理和执行的一部分,不是软件架构师的职责。因此,我将项目跟踪纳入本书,但作为系统和项目设计主要讨论的附录。
Project tracking is part of project management and execution and is not the responsibility of the software architect. I therefore include project tracking in this book, but as an appendix to the main discussion of system and project design.
项目跟踪需要能够判断项目在资源和活动中的位置。在前面的章节中,对项目活动的讨论主要将活动视为原子单位,每个活动都有持续时间或成本估算。这样就可以设计项目,而不管活动内部发生了什么。这种方法不足以进行项目跟踪。您可以将项目中的每个活动(无论是服务还是非编码活动)分解为其自己的小生命周期,并完成内部任务。这些任务可以是连续的、在时间线上交错的或迭代的。例如,图 A-1显示了服务的可能生命周期。
Project tracking requires being able to tell where the project is across resources and activities. In the previous chapters, the discussion of project activities looked at activities mostly as atomic units, with a duration or cost estimation for each activity. This allows for designing the project regardless of what happens inside the activities. That approach is insufficient for project tracking. You can break each activity in the project—be it a service or a noncoding activity—into its own little life cycle, complete with internal tasks. Such tasks can be sequential, interleaved on the timeline, or iterative. For example, Figure A-1 shows a possible life cycle of a service.
图 A-1服务开发生命周期
Figure A-1 Service development life cycle
每项服务都以服务需求规范 (SRS) 开始。它可以很简短,只有几段或几页概述服务需要做什么。架构师需要审查 SRS。有了 SRS,开发人员就可以继续编写服务测试计划 (STP),列出开发人员稍后将证明服务无法正常工作的所有方式。即使有高级人员交接,当开发人员能够执行服务的详细设计时,开发人员也不能总是在不深入了解服务性质的情况下开始详细设计。获得这种洞察力的最佳方法是通过某种构建来直接了解技术可以提供什么或可用的详细设计选项是什么。有了这种洞察力,开发人员可以继续设计服务的细节,然后架构师(可能与其他人一起)对其进行审查。一旦详细设计获得批准,开发人员就可以为服务构建代码。在构建服务的同时,开发人员构建了一个白盒测试客户端。此测试客户端使开发人员能够通过在不断发展的代码上调用调试器来测试每个参数、条件和错误处理路径。代码完成后,开发人员与架构师和其他开发人员一起审查代码,将该服务与其他服务集成,最后根据测试计划执行黑盒单元测试。
Each service starts with a service requirement spec (SRS). This can be brief, as little as a few paragraphs or pages outlining what the service is required to do. The architect needs to review the SRS. With the SRS in place, the developer can proceed to write a service test plan (STP) listing all the ways the developer will later demonstrate the service does not work. Even with a senior hand-off, when the developer is capable of performing the detailed design of the service, the developer cannot always start the detailed design without gaining additional insight into the nature of the service. The best way of obtaining that insight is via some construction to get a first-hand understanding of what the technology can provide or what the available detailed design options are. Armed with that insight, the developer can proceed to design the details of the service, which the architect then reviews (perhaps with others). Once the detailed design is approved, the developer can construct the code for the service. In tandem with the construction of the service, the developer builds a white-box test client. This test client enables the developer to test every parameter, condition, and error-handling path by invoking the debugger on the evolving code. With the code complete, the developer reviews the code with the architect and the other developers, integrates the service with other services, and finally performs black-box unit testing against the test plan.
请注意,图表中的每个审核任务都必须成功完成。审核失败会导致开发人员重复前面的内部任务。为清晰起见,图 A-1未显示这些重试。
Note that each review task in the diagram must complete successfully. A failing review causes the developer to repeat the preceding internal task. For clarity, Figure A-1 does not show these retries.
无论具体的生命周期流程如何,大多数活动都会有内部阶段,例如Requirements, Detailed Design、 或。每个阶段包含一个或多个内部任务,如图A-2Construction所示。
Regardless of the specific life-cycle flow, most activities will have internal phases such as Requirements, Detailed Design, or Construction. Each phase comprises one or more internal tasks, as shown in Figure A-2.
图 A-2活动阶段和任务
Figure A-2 Activity phases and tasks
例如,该Detailed Design阶段可能包括一些构建、详细设计本身以及设计评审。该Construction阶段可能包括实际构建、测试客户端以及代码评审。
For example, the Detailed Design phase may include some construction, the detailed design itself, and the design review. The Construction phase may include the actual construction, the test client, and the code review.
为了支持跟踪,为每个阶段定义一个二元退出标准非常重要,即用一个条件来判断阶段是完成还是未完成。使用图 A-2中的生命周期,您可以使用评审和测试作为阶段的二元退出标准。例如,Construction一旦您进行了代码评审,该阶段就完成了,而不仅仅是代码签入时。
To support tracking, it is important to define a binary exit criterion for each phase—that is, a single condition used to judge whether the phase is either done or not done. With the life cycle in Figure A-2 you can use the reviews and the testing as binary exit criteria for the phase. For example, the Construction phase is complete once you have had the code review, not simply when the code is checked in.
虽然每个活动可能有多个阶段,但这些阶段对活动完成的贡献可能并不相同。您需要以权重(在本例中为百分比)的形式评估阶段的贡献。例如,考虑表 A-1中列出的阶段活动。在此示例活动中,Requirements阶段占活动完成的 15%,而Detailed Design阶段占完成的 20%。
While each activity may have multiple phases, these phases may not contribute equally to the completion of the activity. You need to assess the contribution of a phase in the form of a weight—in this case, a percentage. For example, consider the activity with the phases listed in Table A-1. In this sample activity, the Requirements phase counts for 15% for the completion of the activity, while the Detailed Design phase counts for 20% of the completion.
表 A-1活动阶段及权重
Table A-1 Activity phases and weights
活动阶段 Activity Phase |
重量 (%) Weight (%) |
|---|---|
要求 Requirements |
15 15 |
详细设计 Detailed Design |
20 20 |
测试计划 Test Plan |
10 10 |
建造 Construction |
40 40 |
一体化 Integration |
15 15 |
全部的 Total |
100 100 |
您可以通过多种方式分配阶段的权重。例如,您可以估计阶段的重要性,或者可以估计每个阶段的持续时间(以天为单位),然后除以所有阶段的总和。或者,您可以除以阶段数(例如,有 5 个阶段,每个阶段占 20%),或者您甚至可以考虑活动的类型。例如,您可以决定Requirements阶段将为UI活动分配 40% 的权重,而仅为活动分配 10% 的权重Logging。
You can allocate the weight of the phases in several ways. For example, you can estimate the importance of the phase, or you can estimate the duration in days for each phase and divide by the sum of all phases. Alternatively, you can just divide by the number of phases (e.g., with 5 phases, each phase counts as 20%), or you can even consider the type of the activity. For example, you may decide that the Requirements phase will be weighted 40% for the UI activity and only 10% for the Logging activity.
为了准确跟踪,只要您在所有活动中一致地应用该技术,使用哪种技术来分配阶段权重并不重要。在大多数规模适中的项目中,您最终会在所有活动中拥有数百个阶段。平均而言,分配权重的任何差异都会相互抵消。
For accurate tracking, it does not matter much which technique you use to allocate the weight of the phases as long as you apply the technique consistently across all activities. In most decent-size projects, you will end up with hundreds of phases across all activities. On average, any discrepancies in assigning weights will cancel each other.
给定二元退出标准和每个阶段的权重,您可以计算每个活动在任何时间点的进度。通过跟踪,进度是活动(或整个项目)的完成状态的百分比。
Given the binary exit criterion and the weight of each phase, you can calculate the progress of each activity at any point in time. With tracking, progress is the completion status of an activity (or of the entire project) as a percentage.
活动进度的计算公式为:
The formula of the progress of an activity is:
在哪里:
where:
Wjj是活动阶段的权重。
Wj is the weight of phase j of the activity.
m是时刻 处该活动已完成的阶段数t。
m is the number of completed phases of the activity at time t.
t是一个时间点。
t is a point in time.
活动当时的进度t是该时间之前完成的所有阶段的权重之和t。例如,使用表 A-1,如果前三个阶段(Requirements, Detailed Design、和Test Plan)完成,则活动完成度为 45% (15 + 20 + 10)。
The progress of the activity at the time t is the sum of the weights of all the phases that are completed by the time t. For example, using Table A-1, if the first three phases (Requirements, Detailed Design, and Test Plan) are complete, then the activity is 45% complete (15 + 20 + 10).
与计算活动进度类似,您可以而且应该跟踪每项活动所花费的工作量。通过跟踪,工作量是活动(或整个项目)所花费的直接成本占活动(或整个项目)估计直接成本的百分比。活动工作量的公式为:
Similarly to calculating the progress of an activity, you can and should keep track of the effort spent on each activity. With tracking, effort is the amount of direct cost spent on the activity (or on the entire project) as a percentage of the estimated direct cost for the activity (or for the entire project). The formula for the effort of an activity is:
在哪里:
where:
S(t)是 在 时间 处 活动 所 花费 的 累计 直接 成本t。
S(t) is the cumulative direct cost spent on the activity at time t.
R是该活动的预计直接成本。
R is the estimated direct cost for the activity.
t是一个时间点。
t is a point in time.
必须注意的是,工作量与进度无关。例如,一项活动预计持续 10 天,资源固定,但在开始 15 天后可能只完成了 60%。这项活动已经花费了其计划直接成本的 150%。
It is crucial to note that effort is unrelated to progress. For example, an activity estimated at 10 days duration with fixed resources could be only 60% complete 15 days after starting. This activity has already cost 150% of its planned direct cost.
该项目进度计算公式为:
The formula for the progress of the project is:
在哪里:
where:
Ei是预计的活动持续时间i。
Ei is the estimated duration of activity i.
Ai(t)i是时间点的活动进度t。
Ai(t) is the progress of activity i at time t.
t是一个时间点。
t is a point in time.
N是项目中的活动数。
N is the number of activities in the project.
当时的整体项目进度t是两个估算总和之间的比率。第一个是每个单独活动的所有估计持续时间乘以活动进度的总和。第二个是所有活动估计的总和。请注意,这个简单的公式提供了项目在所有活动、开发人员、生命周期和阶段的进度。
The overall project progress at the time t is a ratio between two sums of estimations. The first is the sum of all the estimated duration of each individual activity multiplied by the activity’s progress. The second is the sum of all activity estimations. Note that this simple formula provides the progress of the project across all activities, developers, life cycles, and phases.
第 7 章讨论了挣值的概念。计划挣值与时间的函数关系公式与项目进度公式非常相似。如果所有活动都按计划完成,那么随着时间的推移,进度将与计划挣值(计划的项目平缓 S 曲线)相匹配。项目进度只是迄今为止的实际挣值。
Chapter 7 discussed the concept of earned value. The formula for the planned earned value as a function of time and the formula for the progress of the project are very similar. If all activities complete exactly as planned, then the progress over time will match the planned earned value, the planned shallow S curve of the project. The progress of the project is simply the actual earned value to date.
为了说明这一点,请考虑表 A-2中的简单项目。假设当时t活动UI只完成了 45%。由于 20% 的 45% 是 9%,因此到目前为止,该UI活动所做的工作已为项目完成贡献了 9%。以同样的方式,您可以计算出当时项目中所有活动的实际挣值t。
To illustrate this point, consider the simple project in Table A-2. Suppose at the time t the UI activity is only 45% complete. Since 45% of 20% is 9%, the work done so far in the UI activity has earned 9% toward the completion of the project. In much the same way, you can calculate the actual earned value of all activities in the project at time t.
表 A-2示例项目当前进度
Table A-2 Example project current progress
活动 Activity |
期间 Duration |
价值 (%) Value (%) |
完全的 (%) Completed (%) |
实际挣值 (%) Actual Earned Value (%) |
|---|---|---|---|---|
前端 Front End |
40 40 |
20 20 |
100 100 |
20 20 |
接入服务 Access Service |
三十 30 |
15 15 |
75 75 |
11.25 11.25 |
用户界面 UI |
40 40 |
20 20 |
四十五 45 |
9 9 |
经理服务 Manager Service |
20 20 |
10 10 |
0 0 |
0 0 |
公共事业服务 Utility Service |
40 40 |
20 20 |
0 0 |
0 0 |
系统测试 System Testing |
三十 30 |
15 15 |
0 0 |
0 0 |
全部的 Total |
200 200 |
100 100 |
— — |
40.25 40.25 |
表 A-2中所有活动的实际挣值相加表明,项目在时间 时已完成 40.25% t。这与进度公式得出的值相同:
Summing up the actual earned value of all activities in Table A-2 reveals that the project is 40.25% complete at time t. This is the same value produced by the progress formula:
项目工作量的公式为:
The formula for the effort of the project is:
在哪里:
where:
Ri是预计的活动直接成本i。
Ri is the estimated direct cost of activity i.
Ci(t)i是时间处活动的努力t。
Ci(t) is the effort of activity i at time t.
Si(t)i是于时间 处活动所花费的累计直接成本t。
Si(t) is the cumulative direct cost spent on activity i at time t.
t是一个时间点。
t is a point in time.
N是项目中的活动数。
N is the number of activities in the project.
总体项目工作量就是所有活动花费的直接成本总和除以所有活动的所有直接成本估算总和。这提供了总体直接成本支出占项目计划直接成本的百分比。
The overall project effort is simply the sum of direct cost spent across all activities divided by the sum of all direct cost estimations of all activities. This provides effort as the overall direct cost expenditure as a percentage of the planned direct cost of the project.
再次注意项目工作量与计划挣值公式的相似性。如果每项活动都分配给一个资源,并且活动最终的成本与计划完全一致并在计划日期完成,那么工作量曲线将与计划挣值曲线相匹配。如果每项活动计划的资源多于(或少于)一个,那么你将不得不根据其自己的计划直接成本曲线来跟踪工作量。然而,在大多数项目中,这两条曲线应该非常吻合。为简单起见,本附录的其余部分假设每项活动都计划一个资源。
Again note the similarity of the project effort to the planned earned value formula. If each activity is assigned to one resource, and the activities end up costing exactly as planned and complete on the planned dates, then the effort curve will match the planned earned value curve. If more (or less) than one resource is planned per activity, then you will have to track the effort against its own planned direct cost curve. However, in most projects the two curves should match closely. For simplicity’s sake, the rest of this appendix assumes that each activity is planned for one resource.
项目的间接成本主要取决于时间和团队结构;它与各个活动的努力或进度无关。您可以使用与迄今为止描述的技术类似的技术来查找间接成本的当前状态。您需要确定对间接成本做出贡献的团队成员(例如核心团队、DevOps 或测试人员),并跟踪他们在项目上花费的时间减去他们的直接成本(如果有)。
The indirect cost of the project is mostly a function of time and the structure of the team; it is independent of the effort or progress of individual activities. You can use a technique similar to those described so far to find the present status of the indirect cost. You need to identify the team members who contribute to indirect cost (such as the core team, DevOps, or testers) and keep track of the time they spend on the project minus their direct cost, if any.
由于间接成本与项目进度和工作量无关,因此跟踪间接成本并没有太大用处。您可能看到的只是一条直线上升,这对建议任何纠正措施毫无帮助。
Since indirect cost is independent of both the progress and effort of the project, tracking indirect cost is not terribly useful. All you are likely to see is a straight line going up, which does not help to suggest any corrective actions.
然而,在一种情况下,跟踪间接成本是有帮助的:报告迄今为止的项目总成本时,在这种情况下,您应该将间接成本添加到直接成本中。本章的其余部分仅讨论跟踪项目并将其与计划进行比较时的累计直接成本(工作量)。
Tracking indirect cost is helpful, however, in one case: when reporting the total cost of the project to date, in which case you should add the indirect cost to the direct cost. The rest of this chapter looks at only the accumulated direct cost (the effort) when tracking the project and comparing it with the plan.
将项目的实际进度与工作量结合起来,您可以了解项目的当前状态。您应该定期重复这些计算。这样您就可以绘制出项目的进度和工作量相对于项目计划的挣值的情况。图 A-3演示了示例项目的这种跟踪形式。
Combining the actual progress of the project with the effort allows you to find the current status of the project. You should repeat these calculations at a recurring interval. This allows you to plot how the project’s progress and effort fare with respect to the project’s planned earned value. Figure A-3 demonstrates this form of tracking for an example project.
图 A-3项目跟踪示例
Figure A-3 Sample project tracking
图 A-3中的蓝线显示了项目的计划挣值。计划挣值本应是一条平缓的 S 曲线;您很快就会看到为什么它在本例中偏离了这种形状。到图 A-3 上显示的时间点图表中,绿线表示项目的实际进度(实际获得的价值),红线表示花费的努力。
The blue line in Figure A-3 shows the planned earned value of the project. The planned earned value should have been a shallow S curve; you will see shortly why it deviated from that form in this example. To the point in time shown on the graph, the green line shows the actual progress of the project (the actual earned value) and the red line illustrates the effort spent.
项目跟踪可以让你准确地了解项目的当前状况和过去情况。然而,真正的问题不是项目的当前状态,而是项目的发展方向。要回答这个问题,你可以预测进度和工作量曲线。请考虑图 A-4中的通用项目视图。
Project tracking allows you to see exactly where the project is and where it has been. The real question, however, is not what the current status of the project is, but rather where the project is heading. To answer that question, you can project the progress and effort curves. Consider the generic project view of Figure A-4.
图 A-4进度和努力预测
Figure A-4 Progress and effort projections
为简单起见,图 A-4用线性回归趋势线(图中显示为实线)代替了浅 S 曲线。蓝线表示项目的计划挣值。理想情况下,绿色进度线和红色工作量线应与蓝线匹配。预计项目将在计划挣值达到 100% 时完成,即图 A-41中的点。但是,您可以看到绿线(实际进度)低于计划。
For simplicity, Figure A-4 replaces the shallow S curves with their linear regression trend lines, shown as solid lines in the figure. The blue line represents the planned earned value of the project. Ideally the green progress line and the red effort line should match the blue line. The project is expected to complete when the planned earned value reaches 100%, point 1 in Figure A-4. However, you can see that the green line (actual progress) is below the plan.
如果外推绿色进度线,则会得到图 A-4中的虚线绿色线。您可以看到,到点时1,预计进度线仅达到完成的 65% 左右(图 A-42中的点)。当预计进度线达到 100% 或图 A-4中的点时,项目将真正完成。点的时间是图 A-4中的点,点和之间的差值是预计的进度超支。33441
If you extrapolate the green progress line, you get the dashed green line in Figure A-4. You can see that by the time of point 1, the projected progress line reaches only about 65% of completion (point 2 in Figure A-4). The project will actually complete when the projected progress line reaches 100%, or point 3 in Figure A-4. The time of point 3 is point 4 in Figure A-4, and the difference between points 4 and 1 is the projected schedule overrun.
类似地,您可以投射测量的工作量线并5在图 A-4中找到点。图 A-4中各点5之间的工作量差异就是项目的预计直接成本超支(百分比)。3
Much the same way, you can project the measured effort line and find point 5 in Figure A-4. The difference in effort between points 5 and 3 in Figure A-4 is the projected direct cost overrun (in percentage) of the project.
假设这是一个为期一年的项目,您以周为间隔对项目进行测量。项目开始一个月后,您已经有了四个参考点,足以绘制出与测量的进度和工作量非常吻合的回归趋势线。回想一下第 7 章,挣值曲线的倾斜度或斜率代表团队的产出量。因此,在为期一年的项目开始一个月后,您已经通过与团队实际产出量高度匹配的预测很好地了解了项目的走向。初始计划挣值只是初始值。预计的进度和工作量线是可能发生的事情。
Suppose this is a year-long project, and you measure the project on a weekly interval. A month into the project you already have four reference points, enough to run a regression trend line that is well fitted to the measured progress and effort. Recall from Chapter 7 that the pitch or slope of the earned value curve represents the throughput of the team. Therefore, a month into a year-long project, you already have a good idea where the project is heading via a projection that is highly calibrated to the actual throughput of the team. The initial planned earned value was just that—initial. The projected progress and effort lines are what will likely happen.
图 A-5显示了图 A-3的实际预测。根据预测条件,预计该项目的进度将比原计划晚一个月左右(或 13%),工作量将超支约 8%。
Figure A-5 shows the actual projections for Figure A-3. Given the terms of the projection, the project is expected to have about a month schedule slip (or 13%) and some 8% effort overrun.
图 A-5预计进度和工作量示例
Figure A-5 Sample projected progress and effort
在图 A-3和图 A-5中,计划挣值是一条截断的浅 S 曲线,因为在 SDP 评审后才开始跟踪该项目。通过故意放弃计划的非常浅的起点,线性趋势线预测变得更适合曲线。
In Figure A-3 and Figure A-5, the planned earned value is a truncated shallow S curve because tracking started for this project after the SDP review. By deliberately dropping the very shallow start of the plan, the linear trend line projections become a better fit to the curves.
预测进度和工作量可以让你了解项目的现状和发展方向。然后就可以再次提高标准并讨论补救措施。请注意,当出现问题时,重要的是处理潜在问题,而不是问题的症状。例如,错过最后期限或超出计划的工作量都是问题的症状,而不是潜在问题本身。本节包含你将遇到的常见症状、可能采取的纠正措施,甚至最佳行动方案的建议。
Projecting the progress and the effort provides an unmatched ability to see where the project is and where it is heading. It then becomes possible to raise the bar again and discuss remedies. Note that when issues arise, it is important to treat the underlying problem, not the symptom of the problem. For example, missing the deadline or exceeding the planned effort are both symptoms of the problem, not the underlying problem itself. This section contains the common set of symptoms that you will encounter, the possible corrective actions to take, and even recommendations for the best course of action.
考虑图 A-6中的进度和工作量预测。图中,预测的进度和工作量线与计划一致,项目已准备好履行承诺。你不需要对这种状况做任何事情;不需要帮助或尝试改善情况。知道什么时候不做某事与知道什么时候做某事一样重要。
Consider the progress and effort projections of Figure A-6. In the figure, the projected progress and effort lines coincide with the plan, and the project is poised to deliver on its commitments. You need do nothing about this state of affairs; there is no need to help or try to improve matters. Knowing when not to do something is as important as knowing when to do something.
图 A-6此项目一切顺利。
Figure A-6 All is well in this project.
在任何项目中,进度和努力与计划保持一致应该是自然状态,因为这是履行承诺的唯一方法。大多数人对于如何按时完成任务都有错误的思维模式。许多人认为,在项目期间,他们可以远离承诺,然后通过英勇的行动和决心,最终按时完成任务。虽然情况可能如此,但发生这种情况的可能性很小,而且这肯定不是一个可重复的期望。大多数项目没有英雄,项目无法经受住剧烈的波动。
Having the progress and the effort align to such a degree with the plan should be the natural state of affairs in any project because that is the only way to meet your commitments. Most people have the wrong mental model for what it takes to meet the deadline. Many think that during the project they can drift away from the commitments and then, via heroic action and determination, they can meet the deadline at the end. While that could be the case, the chances of this happening are slim, and it is certainly not a repeatable expectation. Most projects do not have heroes, and the project cannot survive drastic gyrations.
项目管理的基本规则是:
The cardinal rule of project management is:
在最后期限之前完成的唯一方法就是在整个项目过程中按时完成。
The only way to meet the deadline at the end is to be on time throughout the project.
坚持原计划(或修订计划)绝不会自然而然地发生,需要项目经理不断跟踪,并在整个项目执行过程中采取大量纠正措施。您必须对预测轨迹所揭示的信息做出反应,并避免在进度、努力和计划之间产生差距。
Staying on the original plan (or on a revised plan) will never happen on its own and requires constant tracking by the project manager and numerous corrective actions throughout the project execution. You must respond to the information revealed by the trajectory of the projections and avoid opening a gap between the progress, the effort, and the plan.
考虑图 A-7中的挣值和工作量预测。这个项目显然进展不顺利。进度低于计划,而工作量高于计划。可能的解释是你低估了项目及其活动。
Consider the earned value and effort projections of Figure A-7. This project is clearly not doing well. Progress is below the plan, while effort is above the plan. The likely explanation is that you have underestimated the project and its activities.
图 A-7预测表明低估
Figure A-7 Projections indicating underestimating
处理低估时有两种明显的纠正措施。第一种是根据团队(现在已知的)吞吐量向上修正估计值。事实上,您可以看到预计的进度线何时达到 100%,并且该时间点将成为项目的新完成日期。实际上,您将向下推蓝色计划线,直到它与绿色进度线相遇。这是功能驱动项目中的典型补救措施,您必须与竞争产品或遗留系统实现同等水平,并且在缺少关键方面的情况下发布系统是没有意义的。
There are two obvious corrective actions when dealing with underestimation. The first is to revise the estimations upward based on the (now known) throughput of the team. In fact, you can see when the projected progress line reaches 100%, and that point in time becomes the new completion date of the project. Effectively, you will be pushing down the blue plan line until it meets the green progress line. This is a typical remedy in a feature-driven project, where you must achieve parity with a competing product or a legacy system, and there is no point in releasing the system while missing key aspects.
然而,在日期驱动的项目中,推迟截止日期是行不通的,因为您必须在规定的日期发布。在这种情况下,您应该采取第二种纠正措施:缩小项目范围。通过缩小范围,团队迄今为止所产出的挣值将更加重要,绿色进度线将达到与蓝色计划线的交汇点。
However, pushing the deadline out will not do in a date-driven project where you must release on a set date. In this case, you should take the second type of corrective action: reduce the scope of the project. By reducing the scope, the earned value of what the team has produced so far counts more, and the green progress line will come up to meet the blue plan line.
你当然可以同时采用推迟最后期限和缩小范围的方法,进度预测会准确地告诉你每种补救措施都是必需的。无论你选择哪种应对措施,都需要重新设计项目。
You can certainly apply a combination of pushing the deadline and reducing the scope, and the progress projection will tell you exactly how much or how little of each remedy is required. Whichever response you choose will require redesigning the project.
遗憾的是,许多不愿意在最后期限或范围上妥协的人的下意识反应是让更多人参与该项目。正如弗雷德里克·布鲁克斯博士所观察到的,这就像试图用汽油浇灭大火一样。1
Sadly, the knee-jerk reaction of many who do not wish to compromise on the deadline or the scope is to throw more people on the project. As Dr. Fredrick Brooks observed, this is like trying to put out a fire by dousing it with gasoline.1
1. Frederick P. Brooks,《人月神话》(Addison-Wesley,1975 年)。
1. Frederick P. Brooks, The Mythical Man-Month (Addison-Wesley, 1975).
有几个原因可以解释为什么在项目延期时增加人员几乎总是会让情况变得更糟。首先,即使增加人员可以让绿色进度线更接近蓝色计划,也会让红色工作量线急剧上升。通过破坏项目的一个方面来修复另一个方面是没有意义的(特别是如果项目已经使用了比计划更多的人,如图A-7所示)。其次,你必须让新人入职并接受培训。这需要打断其他团队成员的工作,他们往往是最有资格的人,而且重要的是,他们很可能处于关键路径上;停止或放慢他们的工作将意味着项目进一步延期。你最终会为新人的上手时间和协助入职的现有团队所损失的时间付出代价。最后,即使没有入职成本,新团队也会更大,因此效率会更低。
There are several reasons why adding people to a late project almost always makes matters much worse. First, even if adding people brings the green progress line closer to the blue plan, it will make the red effort line shoot up. It does not make sense to supposedly fix one aspect of the project by breaking another (especially if the project is already using more people than planned, as in Figure A-7). Second, you will have to onboard and train the new people. This requires interrupting the other team members, who often are the most qualified individuals and, importantly, are likely on the critical path; halting or slowing down their work will mean incurring a further delay to the project. You will end up paying both for the ramp-up time for the new people and for the time lost from the existing team who are assisting with the onboarding. Finally, even without the onboarding cost, the new team will be larger—and hence less efficient.
这条规则有一个例外,即在项目开始的时候。一开始,你可以投资于团队成员的全面入职培训。更重要的是,你可以在开始的时候增加人员,因为你可以转向更积极、更压缩的项目设计解决方案。由于并行工作,这样的解决方案通常需要额外的资源。请注意,压缩项目会带来更高程度的风险和复杂性,因此你需要仔细权衡新解决方案的全部影响。
There is one exception to this rule, which is near the project’s origin. At the beginning you can invest in wholesale onboarding of the team members. More importantly, you can get away with adding people at the origin because you can pivot to a more aggressive, compressed project design solution. Such a solution typically does require additional resources due to the parallel work. Note that compressing the project will introduce a higher level of risk and complexity, so you need to carefully weigh the full effect of the new solution.
考虑图 A-8中的进度和工作量预测。在这个项目中,进度和工作量都在计划之内,而且进度甚至低于工作量。这通常是资源泄漏的结果:有人被分配到你的项目,但他们正在为别人的项目工作。结果,他们无法投入所需的工作量,进度进一步滞后。资源泄漏在软件行业很普遍,我观察到泄漏量高达工作量的 50%。
Consider the progress and effort projections of Figure A-8. In this project, both the progress and the effort are under the plan, and progress is even under the effort. This is often the result of resource leaks: People are assigned to your project but they are working on someone else’s project. As a result, they cannot spend the required effort, and progress lags further. Resource leaks are endemic in the software industry, and I have observed leaks as high as 50% of the effort.
图 A-8资源泄漏预测
Figure A-8 Projections indicating resource leak
当发现资源泄漏时,本能反应是简单地堵住泄漏。然而,堵住泄漏往往会适得其反:它可能会引爆另一个项目,而你却成为罪魁祸首。最好的解决办法是召集你的团队发生泄漏的项目的项目经理、你自己以及负责这两个项目的最低级别经理开会。在展示预测图(如图A-8所示)后,你向监督经理提出两个选项。如果另一个项目比你自己的项目更重要,那么图 A-8中的绿线代表你的团队在这种新情况下可以生产的产品,并且截止日期必须调整以适应这一点。但是如果你的项目更重要,那么另一个团队的项目经理必须立即撤销你团队成员的所有源代码控制访问权限,甚至可能将另一个项目的一些顶级资源分配给你的项目,以弥补已经造成的损失。通过以这种方式提出解决方案,无论经理做出什么决定,你都赢了,并重新获得了履行承诺的机会。
When identifying a resource leak, the natural instinct is to simply plug the leak. Plugging the leak, however, will tend to backfire: It could detonate the other project while making you the culprit. The best resolution is to call a meeting between the project manager of the project into which your team is leaking, yourself, and the lowest-level manager responsible for both. After showing the projections chart (such as Figure A-8), you present the overseeing manager with two options. If the other project is more important than your own project, then the green line in Figure A-8 represents what your team can produce under these new circumstances, and the deadline must move to accommodate that. But if your project is more important, then the project manager of the other team must immediately revoke all source control access to your team members and perhaps even assign a few of the other project’s top resources to your project to compensate for the damage already done. By presenting the resolution options this way, whatever the manager decides, you win and regain the chance to meet your commitments.
考虑图 A-9中的进度和工作量预测。虽然看起来这个项目进展良好,因为进度高于计划,但实际上由于估计过高,项目处于危险之中。如第 7 章所述,估计过高和估计过低一样致命。该项目的另一个问题是图 A-9表示项目花费的努力超出了计划要求。这可能是因为分配了太多人员到项目中,或者项目以计划外的并行方式进行。
Consider the progress and effort projections of Figure A-9. While it may look like this project is doing very well because progress is above plan, in reality the project is in danger due to overestimating. As discussed in Chapter 7, overestimating is just as deadly as underestimating. An additional problem with the project in Figure A-9 is that the project is spending more effort than what the plan called for. This may be because too many people were assigned to the project or because the project is working in an unplanned parallel manner.
图 A-9预测表明高估
Figure A-9 Projections indicating overestimating
纠正高估的一个简单方法是下调估计并缩短最后期限。图 A-9中的蓝色计划线将与绿色进度线相交,你就可以计算出需要多少时间才能完成这一目标。不幸的是,缩短最后期限可能只有缺点。提前交付系统往往没有任何好处。例如,客户可能要到商定的最后期限才会付款,或者服务器可能还没有准备好,又或者团队接下来无事可做。同时,缩短持续时间会增加团队的压力。人们对压力的反应是非线性的。适度的压力可能会产生积极的结果,而过大的压力则会打击积极性。如果团队成员通过放弃来应对压力,项目就会崩溃。通常很难知道那条细线在哪里。
One simple corrective action for overestimating is to revise the estimations downward and bring the deadline in. The blue plan line in Figure A-9 will then come up to meet the green progress line, and you can calculate by how much to do just that. Unfortunately, bringing the deadline in likely has only downsides. Often delivering the system ahead of schedule has no benefits. For example, the customer may not pay until the agreed deadline, or the servers may not be ready, or the team might have nothing to do next. At the same time, reducing the duration will increase the pressure on the team. The way people respond to pressure is nonlinear. Some modest pressure may have positive results, whereas excessive pressure demotivates. If the team members respond to the pressure by giving up, the project implodes. It is usually hard to know where that thin line is.
另一个纠正措施是保持截止日期不变,但向下修正估计值,并通过增加项目范围来超额交付。增加要做的事情(也许开始下一个子系统的工作)将减少实际挣值,绿色进度线将下降以与图 A-9中的蓝色计划线相交。增加价值始终是件好事,但它确实会带来压力过大的风险。
Another corrective action is to keep the deadline where it is but revise the estimations downward and over-deliver by increasing the scope of the project. Adding things to do (perhaps starting work on the next subsystem) will reduce the actual earned value and the green progress line will come down to meet the blue plan line in Figure A-9. Adding value is always a good thing, but it does carry the over-pressure risk.
解决高估问题的最佳方法是释放部分资源。这样做后,红色工作量线会下降,因为小团队的成本更低。绿色进度线也会下降,因为小团队的产出量会减少。小团队也应该更有效率。如果你足够早地发现高估,你甚至可以选择一个压缩程度较低的项目解决方案。
The best way of fixing overestimation is to release some of your resources. When you do so, the red effort line will go down since a smaller team costs less. The green progress line will come down because the smaller team has a reduced throughput. The smaller team should also be more efficient. If you detect the overestimation early enough, you can even choose a less compressed project solution.
预测可让您在潜在问题变得严重之前分析项目的走向。再次检查图 A-4。等到项目达到2图中的点,然后将其纠正到蓝线,这需要痛苦的,甚至是毁灭性的操作。使用预测,您可以更早地发现趋势,并在线条之间出现任何明显差距之前进行较小的修正。行动越早,生效的时间就越长,对项目其余部分的干扰就越小,越容易通过管理层,成功的可能性就越大。主动总是比被动更好,一分预防往往胜过十分治疗。
The projections allow you to analyze where the project is heading long before an underlying problem becomes severe. Examine Figure A-4 again. Waiting until the project reaches point 2 in the figure and then correcting it up to the blue line requires a painful, if not devastating maneuver. Using projections, you can detect the trend much earlier, and perform a smaller correction before any significant gap appears between the lines. The earlier the action, the more time it has to take effect, the less disruptive it is to the rest of the project, the easier it is to run it past management, and the more likely it is to succeed. It is always better to be proactive than reactive, and an ounce of prevention is often worth a pound of cure.
就像开车一样,在项目执行过程中,你会经常做出小幅修正,而不是进行几次大幅度修正。好的项目总是一帆风顺的,无论是在计划的挣值、人员分配图还是在本例中的进度和工作量线方面。
Very much like driving a car, in your project execution you make frequent small corrections as opposed to a few drastic ones. Good projects are always smooth, whether in the planned earned value, the staffing distribution chart or, as in this case, the progress and effort lines.
请注意,此处展示的技术是分析项目的趋势,而不是实际项目。这是推动项目的正确方法。再次使用汽车类比,您不能通过低头看人行道或严格看后视镜来驾驶汽车前进。汽车现在在哪里或曾经在哪里与驾驶它前进基本无关。您驾驶汽车时要看汽车将要去的地方,并根据该预测采取纠正措施。
Note that the technique shown here is analyzing the trend of the project, not the actual project. This is the correct way of driving the project. To use the car analogy again, you do not drive your car forward by looking down at the pavement or looking strictly in the rear-view mirror. Where the car is now or where it has been is largely irrelevant for driving it forward. You drive your car looking at where the car is going to be and taking corrective actions against that projection.
请注意,“project”既可以是名词(the project),也可以是动词(to project)。这并非偶然。项目的本质是项目的能力。之所以被称为项目,是因为你应该进行项目。相反,如果你不进行项目,你就没有项目。
Note that “project” can be both a noun (the project) and a verb (to project). This is not accidental. The essence of a project is the ability to project. It is called a project because you are supposed to project. Conversely, if you do not project, you do not have a project.
奇怪的是,管理层甚至可能试图改变项目范围,而不修改分配给项目的持续时间和资源。这反过来会给你履行承诺带来问题。
Oddly, management may even try to change the scope of the project without modifying the duration and resources assigned to the project. This, in turn, creates a problem with you meeting your commitments.
将预测与项目设计相结合是处理项目范围意外变化的最终方法。当有人试图增加(或减少)项目范围并征求您的批准或同意时,您应该礼貌地要求回复他们。现在您需要重新设计项目以评估变更的后果。如果变更不影响关键路径或成本并且在团队的能力范围内,则重新设计可能很小。使用预测从实际吞吐量和成本的角度判断您执行新计划的能力。当然,变更可能会延长项目的持续时间并增加成本和对资源的需求。您可能必须选择另一个项目设计选项,甚至完全设计新的项目设计选项。
Combining projections with project design is the ultimate way of handing unanticipated changes to the scope of the project. When anyone tries to increase (or decrease) the scope of the project and asks for your approval or consent, you should politely ask to get back to them with your answer. You now need to redesign the project to assess the consequences of the change. This redesign could be minor if the change does not affect the critical path or the cost and is within the capability of the team. Use the projections to judge your ability to deliver on the new plan from the perspective of the actual throughput and cost. Of course, the change could extend the duration of the project and increase the cost and demand for resources. You may have to choose another project design option, or even devise new project design options altogether.
当您回到管理层时,请提出变更所需的新工期和总成本,包括新的预测,并询问他们是否愿意这样做。如果他们无法承担新的时间表和成本影响,那么实际上什么都没有改变。如果他们接受,那么您就有了新的项目时间表和成本承诺。无论哪种方式,您都会始终履行承诺。这些承诺可能不是项目开始时的原始承诺,但话又说回来,您不是改变计划的人。
When you get back to management, present the new duration and total cost that the change requires, including the new projections, and ask if they want to do it. If they cannot afford the new schedule and cost implications, then nothing really changed. If they accept them, then you have new schedule and cost commitments for the project. Either way, you will always meet your commitments. These commitments may not be the original ones the project started with, but then again, you are not the one who changed the plan.
大多数软件团队都未能履行承诺。他们没有给管理层任何信任他们的理由,却给了管理层不信任他们的理由。因此,管理层规定了不可能的最后期限,同时完全预料到这些期限会延误。如第 7 章所述,激进的最后期限大大降低了成功的可能性,将失败表现为自我实现的预言。
Most software teams fail to meet their commitments. They have given management no reason to trust them and every reason to distrust them. As a result, management dictates impossible deadlines, while fully expecting them to slip. As discussed in Chapter 7, aggressive deadlines drastically reduce the probability of success, manifesting failure as a self-fulfilling prophecy.
项目跟踪是打破这种恶性循环的好方法。您应该与每个可能的决策者和经理分享预测。不断展示项目当前的良性状态和未来趋势。展示在问题出现前几个月发现问题的能力。坚持(或只是采取)纠正措施。所有这些行动都将使您成为负责任、负责任、值得信赖的专业人士。这将赢得尊重并最终赢得信任。当您获得上级的信任时,他们往往会让您独自完成工作,让您取得成功。
Project tracking is a good way of breaking that vicious cycle. You should share the projections with every possible decision maker and manager. Constantly show the project’s current benign status and the future trends. Demonstrate the ability to detect problems months before they raise their head. Insist on (or just take) corrective actions. All of these actions will establish you as a responsible, accountable, trustworthy professional. This will lead to respect and eventually trust. When you have gained the trust of those above you, they will tend to leave you alone to do your work, allowing you to succeed.
本书的第一部分介绍了系统架构:如何将系统分解为组件和服务,以及如何从服务中组合出所需的行为。这并不是设计的结束,您必须通过设计每个服务的细节来继续该过程。
The first part of this book addressed the system architecture: How to decompose the system into its components and services and how to compose the required behavior out of the services. This is not the end of the design and you must continue the process by designing the details of each service.
详细设计是一个庞大的话题,值得专门写一本书来讨论。本附录将详细设计的讨论限制在服务设计的最重要方面:服务向其客户提供的公共契约。只有在确定了服务契约之后,您才能填写内部设计细节,例如类层次结构和相关设计模式。这些内部设计细节以及数据契约和操作参数都是特定于领域的,因此不在本附录的讨论范围内。但是,从理论上讲,这里为整个服务契约概述的相同设计原则甚至适用于数据契约和参数级别。
Detailed design is a vast topic, worthy of its own book. This appendix limits its discussion of detailed design to the most important aspect of the design of a service: the public contract that the service presents to its clients. Only after you have settled on the service contract can you fill in internal design details such as class hierarchies and related design patterns. These internal design details as well as data contracts and operation parameters are domain-specific and, therefore, outside the scope of this appendix. However, in the abstract, the same design principles outlined here for the service contract as a whole apply even at the data contract and parameter levels.
本附录表明,即使对于像设计服务契约这样特定于您的系统的任务,某些设计准则和指标也超越了服务技术、行业领域或团队。虽然本附录中的想法很简单,但它们对您开发服务和构建工作的方式有着深远的影响。
This appendix shows that even with a task as specific to your system as the design of contracts for your services, certain design guidelines and metrics transcend service technology, industry domains, or teams. While the ideas in this appendix are simple, they have profound implications for the way you go about developing your services and structuring the construction work.
要理解如何设计服务,首先必须认识到好设计与坏设计的属性。考虑图 B-1中的系统架构。这对您的系统来说是个好的设计吗?图 B-1中的系统设计使用一个大型组件来实现系统的所有要求。理论上,您可以通过将所有代码放在一个庞大的函数中来构建任何系统,该函数包含数百个参数和数百万行嵌套的条件代码。然而,没有一个头脑正常的人会认为单个大型组件就是好的设计。它实际上是一个典型的例子,说明了什么不该做。根据第 4 章,您也无法验证这样的设计。
To understand how to design the services, you must first recognize the attributes of a good or a bad design. Consider the system architecture in Figure B-1. Is this a good design for your system? The system design in Figure B-1 uses a single large component to implement all the requirements of the system. In theory, you could build any system this way, by putting all the code in one monstrous function, with hundreds of arguments and millions of nested lines of conditional code. Yet no one in their right mind would suggest that a single large thing is a good design. It is literally the canonical example of what not to do. According to Chapter 4, you also cannot validate such a design.
图 B-1单片系统设计
Figure B-1 Monolithic system design
接下来,考虑图 B-2中的设计。这对您的系统来说是一个好的设计吗?图 B-2中的系统设计使用大量小组件或服务来实现系统(为了减少视觉混乱,图中未显示跨服务交互线)。理论上,您可以通过将每个需求放在单独的服务中来以这种方式构建任何系统。这不仅是一个糟糕的设计,而且是另一个不该做的典型例子。与上一种情况一样,您也无法验证这样的设计。
Next, consider the design in Figure B-2. Is this a good design for your system? The system design in Figure B-2 uses a huge number of small components or services to implement the system (to reduce the visual clutter, the figure does not show cross-service interaction lines). In theory, you could build any system this way by placing every requirement in separate service. That, too, is not just a bad design, but another canonical example of what not to do. As with the previous case, you also cannot validate such a design.
图 B-2超粒度系统设计
Figure B-2 Super-granular system design
最后,检查图 B-3中的系统设计。这对您的系统来说是一个好的设计吗?虽然您不能说图 B-3对您的系统来说是一个好的设计,但您可以说它肯定比单个大型组件或大量小型组件的设计更好。
Finally, examine the system design in Figure B-3. Is this a good design for your system? While you cannot state that Figure B-3 is a good design for your system, you could say that it is certainly a better design than a single large component or an explosion of small components.
图 B-3模块化系统设计
Figure B-3 Modular system design
能够确定图 B-3中的系统设计优于前两个设计,这令人惊讶。毕竟,你对系统的性质、领域、开发人员或技术一无所知,但你直觉地知道它更好。每当你评估模块化设计时,你都在使用图 B-4描述的心理模型。
The ability to determine that the system design of Figure B-3 is better than the previous two is surprising. After all, you do not know anything about the nature of the system, the domain, the developers, or the technology—yet you intuitively know it is better. Whenever you evaluate a modular design, you are using a mental model described by Figure B-4.
图 B-4规模和数量对成本的影响 [图片采用并修改自 Juval Lowy 的《Programming .NET Components》第 2 版 (O'Reilly Media, 2003);Juval Lowy 的《Programming WCF Services》第 1 版 (O'Reilly Media, 2007);以及 Edward Yourdon 和 Larry Constantine 的《Structured Design》 (Prentice-Hall, 1979)。]
Figure B-4 Size and quantity effect on cost [Image adopted and modified from Juval Lowy, Programming .NET Components, 2nd ed. (O’Reilly Media, 2003); Juval Lowy, Programming WCF Services, 1st ed. (O’Reilly Media, 2007); and Edward Yourdon and Larry Constantine, Structured Design (Prentice-Hall, 1979).]
当您用较小的构建块(例如服务)构建系统时,您必须支付两部分成本:构建服务的成本和将服务组合在一起的成本。您可以在一个大型服务和无数个小服务之间的任何一点构建系统,图 B-4显示了该分解决策对构建系统成本的影响。
When you build a system out of smaller building blocks such as services, you have to pay for two elements of cost: the cost of building the services and the cost of putting it all together. You can build a system at any point on the spectrum between one large service and countless little services, and Figure B-4 captures the effect of that decomposition decision on the cost of building the system.
每个服务的实施成本(图 B-4中的蓝线)表现出一些非线性行为。随着服务数量的减少,它们的大小会增加(在曲线最左侧最多为一个大型整体)。问题是,随着服务规模的增加,其复杂性也会以非线性方式增加。一个服务是另一个服务的 2 倍,其复杂性可能高出 4 倍,而一个服务是另一个服务的 4 倍,其复杂性可能高出 20 倍或 100 倍。复杂性的增加反过来又会导致成本的非线性增加。因此,成本是规模的复合、非线性、单调递增函数。因此,随着服务数量的减少,服务规模会增加,并且随着规模的增加,成本会以非线性方式激增。相比之下,对于具有多种服务的系统设计(图 B-4的最右侧),每个服务的成本微乎其微,接近于零。
The implementation cost per service (the blue line in Figure B-4) represents some nonlinear behavior. As the number of services decreases, their size increases (up to one large monolith on the far-left side of the curve). The problem is that as the size of a service increases, its complexity increases in a nonlinear way. A service 2 times as big as another may be 4 times more complex, and a service 4 times as big may be 20 or 100 times more complex. Increased complexity, in turn, induces a nonlinear increase in cost. As a result, cost is a compounded, nonlinear, monotonically increasing function of size. Consequently, as the number of services decreases, service size increases, and with each size increase, the cost explodes in a nonlinear way. In contrast, with a system design that has a multitude of services (the far right side of Figure B-4), the cost per service is miniscule, approaching zero.
服务的集成成本会随着服务数量的增加而呈非线性增加。这也是复杂性(在本例中是可能交互的复杂性)的结果。服务越多意味着可能的交互越多,从而增加了复杂性。如第 12 章所述,由于连通性和连锁反应,随着服务数量(n)的增加,复杂性与成比例增长,甚至可能达到的数量级。这种交互复杂性直接影响集成成本,这就是为什么集成成本(图 B-4中的红线)也是一条非线性曲线的原因。因此,在图 B-4的最右侧,集成成本随着服务数量的增加而不断飙升。相反,在曲线的最左侧,可能只有单个大型服务,由于没有什么可集成,因此集成成本接近于零。n2nn
The integration cost of services increases in a nonlinear way with the number of services. This, too, is the result of complexity—in this case, the complexity of the possible interactions. More services imply more possible interactions, adding more complexity. As mentioned in Chapter 12, due to connectivity and ripple effects, as the number of services (n) increases, complexity grows in proportion to n2 but can even be on the order of nn. This interaction complexity directly affects the integration cost, which is why the integration cost (the red line in Figure B-4) is also a nonlinear curve. Consequently, at the far right side of Figure B-4, the integration cost shoots up ever higher as the number of services increases. In contrast, at the far left side of the curve where there is perhaps only single large service, the integration cost approaches zero since there is nothing to integrate.
对于任何给定的系统,您总是需要支付两个成本要素(实施成本和集成成本)。图 B-4中的绿色虚线表示这两个成本要素的总和,即系统总成本。如您所见,对于任何系统,都有一个最低成本区域,其中的服务不大不小,不多也不少。每当您设计一个系统时,您都必须将其带到最低成本区域(并保持在那里)。请注意,您不一定希望处于总成本曲线的最低点,而只是希望处于总系统成本相对平坦的最低成本区域。一旦曲线开始趋于平稳,找到绝对最小值的成本将超过系统成本的任何节省。如第 4 章所述,每项设计工作总是有一个收益递减点,在该点上它只是足够好。
With any given system you will always have to pay for both elements of cost (implementation cost and integration cost). The dashed green line in Figure B-4 represents the sum of these two cost elements, or the total system cost. As you can see, for any system there is an area of minimum cost, where the services are not too big and not too small, not too many and not too few. Whenever you design a system, you must bring it to the area of minimum cost (and keep it there). Note that you do not necessarily wish to be in the very minimum of the total cost curve, but merely in the area of minimum cost where the total system cost is relatively flat. Once the curve begins to level, the cost of finding the absolute minimum will exceed any savings in system cost. As mentioned in Chapter 4, every design effort always has a point of diminishing return where it is simply good enough.
你必须避免的是图表的边缘,因为这些边缘会非线性地恶化,并且成本会高出许多倍(甚至数十倍)。构建非线性更昂贵的系统的挑战在于,所有组织可用的工具基本上都是线性工具。组织可以给你一个又一个开发人员,或者给你一个月又一个月。但如果潜在问题的性质是非线性的,你将永远无法赶上。在最低成本区域之外设计的系统在有人编写第一行代码之前就已经失败了。
What you must avoid are the edges of the chart, because these edges are nonlinearly worse and become many multiples (even dozens of times) more expensive. The challenge with building a nonlinearly more expensive system is that the tools all organizations have at their disposal are fundamentally linear tools. The organization can give you another developer and then another developer, or another month and then another month. But if the nature of the underlying problem is nonlinear, you will never catch up. Systems designed outside the area of minimum cost have already failed before anyone has written the first line of code.
如第 4 章所述,良好的基于波动性的分解提供了最小的构建块集,您可以将其组合在一起以满足所有需求——已知和未知、现在和未来。这样的分解会产生最低成本区域中的服务数量,但它没有说明它们的形状。即使分解遵循方法指南,将服务保持在最低成本区域也需要您正确设计每个服务合同。
As explained in Chapter 4, a good volatility-based decomposition provides the smallest set of building blocks that you can put together to satisfy all requirements—known and unknown, present and future. Such a decomposition yields a service count in the area of minimum cost, but it says nothing about their shape. Even when the decomposition follows The Method guidelines, keeping the services in the area of minimum cost requires you to design each service contract correctly.
系统中的每个服务都会向其客户端公开一个契约。契约只是一组客户端可以调用的操作。因此,契约是服务向外界呈现的公共接口。许多编程语言甚至使用interface关键字来定义服务契约。虽然服务契约是一个接口,并非所有接口都是服务契约。服务契约是服务承诺支持的正式接口,不变。
Each service in the system exposes a contract to its clients. The contract is merely a set of operations that the clients can call. As such, the contract is the public interface that the service presents to the world. Many programming languages even use the interface keyword to define the service contract. While the service contract is an interface, not all interfaces are service contracts. Service contracts are a formal interface that the service commits to support, unchanged.
用人类世界的比喻来说,生活中充满了正式和非正式的合同。雇佣合同(通常使用法律术语)定义了雇主和雇员之间的义务。两家公司之间的商业合同定义了他们作为服务提供者和服务消费者之间的互动。这些都是正式的接口形式,如果合同双方违反合同或更改其条款,他们通常会面临严重后果。相比之下,当你叫出租车时,就存在一个隐含的非正式合同:司机会把你安全送到目的地,而你要为这项服务付费。你们双方都没有签署描述这种互动性质的正式合同。
To use an analogy from the human world, life is full of both formal and informal contracts. An employment contract defines (often using legal jargon) the obligations of both the employer and the employee to each other. A commercial contract between two companies defines their interactions as a service provider and a service consumer. These are formal forms of interfacing, and the parties to the contract often face severe implications if they violate the contract or change its terms. In contrast, when you hail a taxi, there is an implied informal contract: The driver will take you safely to your destination, and you will pay for this service. Neither of you signed a formal contract describing the nature of that interaction.
合同不仅仅是一个正式的接口:它代表了支持实体向外界展现的一个方面。例如,一个人可以签署一份雇佣合同,代表自己作为雇员的身份。这个人可以有其他身份,但雇主只会看到并关心这一特定身份。一个人可以签署其他合同,例如土地租赁合同、婚姻合同、抵押合同等。这些合同中的每一个都是这个人的一个身份:作为雇员、作为房东、作为配偶或作为房主。同样,一个服务可以支持多个合同。
A contract goes beyond being just a formal interface: It represents a facet of the supporting entity to the outside world. For example, a person can sign an employment contract representing that person as an employee. That person could have other facets, but the employer only sees and cares about that particular facet. A person can sign additional contracts such as a land lease contract, a marriage contract, a mortgage contract, and so on. Each one of these contracts is a facet of the person: as an employee, as a landlord, as a spouse, or as a homeowner. Similarly, a service can support more than one contract.
设计良好的服务位于图 B-4中的最低成本区域。不幸的是,很难回答这个区域内什么才是优质服务这一基本问题。您可以做的是进行一系列合理的简化,直到找到可以回答的问题。第一次简化假设服务与其契约之间的比率为一比一。根据这一假设,您可以重新标记图 B-4,将“服务”一词替换为“契约”,图表的行为将保持不变。
Well-designed services are in the area of minimum cost of Figure B-4. Unfortunately, it is difficult to answer the fundamental question of what makes a good service in this area. What you can do is go through a series of reasonable reductions until you find a question that you can answer. The first reduction assumes a one-to-one ratio between services and their contracts. Given this assumption, you could relabel Figure B-4, replacing the word “Service” with the word “Contract,” and the behavior of the chart will remain unchanged.
实际上,单个服务可以支持多个契约,多个服务也可以支持一个特定的契约。在这些情况下,图 B-4中的曲线会从左向右或上下移动,但它们的行为保持不变。
In reality, a single service can support multiple contracts, and multiple services can support a specific contract. In these cases, the curves in Figure B-4 shift left to right or up and down, but their behavior remains the same.
在服务和合同一一映射的假设下,你已经将“什么是好的服务?”的问题转化为“什么什么是好的合同?好的合同是服务在逻辑上一致、有凝聚力和独立的方面。这些属性最好用日常生活中的类比来解释。
Under the assumption that services and contracts are mapped one-to-one, you have transformed the question “What is a good service?” into the question “What is a good contract?” Good contracts are logically consistent, cohesive, and independent facets of the service. These attributes are best explained using analogies from daily life.
您会签署一份规定您只能在特定地址居住才能在公司工作的雇佣合同吗?您会拒绝这样的合同,因为以您的地址为条件来决定您的就业状况在逻辑上是不一致的。毕竟,如果您按照预期的标准完成约定的工作,那么您住在哪里就无关紧要了。好的合同在逻辑上总是一致的。
Would you sign an employment contract that states you can only work at the company so long as you live at a specific address? You would reject such a contract because it is logically inconsistent to condition your employment status on your address. After all, if you do the agreed-upon work to the expected standard, where you live is irrelevant. Good contracts are always logically consistent.
您会签署没有明确说明工资数额的雇佣合同吗?同样,您会拒绝。好的合同总是具有连贯性,包含描述互动所需的所有方面 — 不多不少。
Would you sign an employment contract that does not specify how much you are paid? Again, you would reject it. Good contracts are always cohesive and contain all the aspects required to describe the interaction—no more, no less.
你会将婚姻合同置于雇佣合同之下吗?你会拒绝这份合同,因为合同的独立性同样重要。每一份合同或每个方面都应该独立存在,并独立于其他合同运作。
Would you make your marriage contract dependent on your employment contract? You would reject this contract because the independence of the contract is just as important. Each contract or facet should stand alone and operate independently of the other contracts.
这些属性还指导着获得合同的过程。你会花钱请房地产律师为你起草一份公寓租赁合同吗?还是你会在网上搜索公寓租赁合同,打印第一个搜索结果,填写地址和租金,然后就完事了?如果一份在线合同足以适用于数百万其他租赁,而无需具体到任何公寓(这确实是一项不小的成就),那么对你来说,它还不够好吗?合同必须发展到包括所有有凝聚力的细节,如租金,并避免租户工作地点等不一致的事情。它还必须独立于其他合同——也就是说,是一个真正的独立方面。
These attributes also guide the process of obtaining the contract. Would you pay a real estate lawyer to craft a contract just for you to rent your apartment? Or would you search the web for an apartment rental contract, print the first search result, fill in the blanks with the address and the rent, and be done with it? If an online contract is good enough for millions of other rentals without being specific to any apartment (which would truly be a nontrivial achievement), would it not be good enough for you? The contract must have evolved to include all the cohesive details such as rent and to avoid the inconsistent things like where the renters work. It must also be independent of other contracts—that is, a true stand-alone facet.
请注意,您并不是在寻找比其他人使用的更好的合同。您只是想重用其他人正在使用的相同合同。正是因为它如此可重用,所以它才是一个好的合同。最后的观察是,逻辑上一致、内聚和独立的合同是可重用的合同。
Note that you are not searching for a better contract than anyone else is using. You simply want to reuse the very same contract that everyone else is using. It is precisely because it is so reusable that it is a good contract. The final observation is that logically consistent, cohesive, and independent contracts are reusable contracts.
请注意,可重用性并不是契约的二元特性。每个契约都处于可重用性范围内的某个位置。契约的可重用性越高,其逻辑一致性、内聚性和独立性就越强。想象一下图 B-1中服务前面的契约。该契约非常庞大,并且针对特定服务非常具体。它在逻辑上肯定是不一致的,因为它是系统所做的一切的臃肿垃圾场。世界上其他人重用该服务契约的可能性基本为零。
Note that reusability is not a binary trait of a contract. Every contract lies somewhere on the spectrum of reusability. The more reusable the contract, the more it is logically consistent, cohesive, and independent. Imagine the contract in front of the service in Figure B-1. That contract is massive, and it is extremely specific for that particular service. It is certainly logically inconsistent because it is a bloated dumping ground for everything that the system does. The likelihood that anyone else in the world will ever reuse that service contract is basically zero.
接下来,想象一下图 B-2中某个微小服务的契约。该契约非常小,并且对于其上下文而言极其特殊。如此小的东西不可能具有凝聚力。同样,其他人重用该契约的可能性为零。
Next, imagine the contract on one of the tiny services in Figure B-2. That contract is miniscule and extremely specialized for its context. Something so small cannot possibly be cohesive. Again, the likelihood that anyone else will ever reuse that contract is zero.
图 B-3中的服务至少提供了一些希望。也许图 B-3中服务的契约已经发展到包括与其交互相关的所有内容——不多也不少。交互数量少也表明了独立方面。这些契约很可能是可重用的。
The services in Figure B-3 offer at least some hope. Perhaps the contracts on the services in Figure B-3 have evolved to include everything pertaining to their interactions—no more, no less. The small number of interactions also indicates independent facets. The contracts could very well be reusable.
一个重要的观察结果是,重用的基本要素是契约,而不是服务本身。例如,我用来写这本书的计算机鼠标与其他鼠标都不一样。它的每个部分都不是可重用的。鼠标的外壳是为这个特定的鼠标型号设计的,如果不进行昂贵的修改,我无法将它安装在任何其他鼠标上(同一型号的另一个实例除外)。但是,“鼠标 - 手”接口是可重用的;我可以操作那个鼠标,你也可以。您的鼠标支持完全相同的接口;换句话说,它重用了该接口。存在数千种不同的鼠标型号,然而,所有型号都重用相同的接口这一事实恰恰是良好接口的终极标志。事实上,接口“鼠标 - 手”应该称为“工具 - 手”(见图B-5)。
An important observation is that the basic element of reuse is the contract, not the service itself. For example, the computer mouse I use to write this book is unlike any other mouse. Each part of it is not reusable. The case of the mouse was designed for this particular mouse model, and I cannot mount it on any other mouse (except another instance of the same model) without costly modification. However, the interface “mouse–hand” is reusable; I can operate that mouse, and so can you. Your mouse supports exactly the same interface; put differently, it reuses the interface. Many thousands of different mouse models exist, yet it is precisely the fact that across all models each reuses the same interface which is the ultimate indication of a good interface. In fact, the interface “mouse–hand” should be called “tool–hand” (see Figure B-5).
图 B-5重用界面 [图片灵感来自 Matt Ridley 的《理性乐观主义者:繁荣如何演变》(HarperCollins,2010 年)。图片:Mountainpix/Shutterstock;New Africa/Shutterstock。]
Figure B-5 Reusing interfaces [Figure inspired by Matt Ridley, The Rational Optimist: How Prosperity Evolves (HarperCollins, 2010). Images: Mountainpix/Shutterstock; New Africa/Shutterstock.]
自史前时代以来,人类就一直在重复使用“工具—手”界面。虽然石斧上的任何石粒都无法在鼠标中重复使用,鼠标上的任何电子元件都无法在石斧中重复使用,但两者都重复使用了相同的界面。好的界面是可重复使用的,而底层服务永远都不是。
Our species has been reusing the “tool–hand” interface since prehistoric times. While no grain of stone from the stone axe is reusable in the mouse, and no piece of electronics from the mouse is useful in the stone axe, both reuse the same interface. Good interfaces are reusable, while the underlying services never are.
在设计服务契约时,您必须始终从重用的角度考虑。这是确保即使在架构和分解之后,您的服务仍处于最低成本范围内的唯一方法。请注意,设计可重用契约的义务与某人是否最终会真正重用契约无关。其他方对契约的实际重用程度或需求完全无关紧要。您必须将契约设计为,它们将永久地在多个系统(包括您当前的系统和竞争对手的系统)中无数次重用。一个简单的例子就可以很好地证明这一点。
When designing the contracts for your services, you must always think in terms of elements of reuse. That is the only way to assure that even after architecture and decomposition, your services will remain in the area of minimum cost. Note that the obligation to design reusable contracts has nothing to do with whether someone will actually end up reusing the contracts. The degree of actual reuse or demand for the contract by other parties is completely immaterial. You must design the contracts as if they will be reused countless times in perpetuity, across multiple systems including your current one and those of your competitors. A simple example will go a long way to demonstrate this point.
假设您需要实现一个软件系统来运行销售点收银机。该系统的要求可能包括查询商品价格、与库存集成、接受付款和跟踪忠诚度计划等用例。所有这些都可以使用方法和适当的管理器、引擎等轻松完成。为了便于说明,假设系统需要连接到条形码扫描仪并用它读取商品的标识符。条形码扫描仪设备只不过是系统的资源,因此您需要为相应的ResourceAccess服务设计服务契约。条形码扫描仪访问服务的要求是它应该能够扫描商品的代码、调整扫描光束的宽度以及通过打开和关闭端口来管理与扫描仪的通信端口。您可以IScannerAccess像这样定义服务契约:
Suppose you need to implement a software system for running a point-of-sale register. The requirements for the system likely have use cases for looking up an item’s price, integrating with inventory, accepting payment, and tracking loyalty programs, among others. All of this can easily be done using The Method and the appropriate Managers, Engines, and so on. For illustration purposes, suppose the system needs to connect to a barcode scanner and read an item’s identifier with it. The barcode scanner device is nothing more than a Resource to the system, so you need to design the service contract for the corresponding ResourceAccess service. The requirements for the barcode scanner access service are that it should be able to scan an item’s code, adjust the width of the scanning beam, and manage the communication port to the scanner by opening and closing the port. You could define the IScannerAccess service contract like so:
接口IScannerAccess
{
长扫描码();
无效调整光束();
无效打开端口();
无效关闭端口();
}interface IScannerAccess
{
long ScanCode();
void AdjustBeam();
void OpenPort();
void ClosePort();
}
服务IScannerAccess契约支持扫描仪所需的功能。这可轻松使不同类型的服务提供商(例如BarcodeService和 )QRCodeService实现IScannerAccess契约:
The IScannerAccess service contract supports the required features of a scanner. This easily enables different types of service providers, such as the BarcodeService and the QRCodeService to implement the IScannerAccess contract:
条形码扫描仪类:IScannerAccess
{...}
QRCodeScanner 类:IScannerAccess
{...}class BarcodeScanner : IScannerAccess
{...}
class QRCodeScanner : IScannerAccess
{...}
您可能会感到满足,因为您已经IScannerAccess在多个服务中重用了服务合同。
You may feel content because you have reused the IScannerAccess service contract across multiple services.
一段时间后,零售商联系您并提出以下问题:在某些情况下,最好使用其他设备(例如数字键盘)来输入商品代码。但是,IScannerAccess合约假设底层设备使用某种光学扫描仪。因此,它无法管理非光学设备,例如数字键盘或射频识别 (RFID) 读取器。从重用的角度来看,最好抽象实际的读取机制并将扫描操作重命名为读取操作。毕竟,硬件设备使用哪种机制来读取商品代码应该与系统无关。您还应该将合约重命名为,IReaderAccess并确保合约设计中没有任何内容阻止所有类型的代码读取器重用合约。例如,该AdjustBeam()操作对于键盘来说是没有意义的。最好将原始合约拆分IScannerAccess为两个合约,并分解有问题的操作:
Sometime later, the retailer contacts you with the following issue: In some cases it is better to use other devices, such as a numerical keypad, to enter item code. However, the IScannerAccess contract assumes the underlying device uses some kind of an optical scanner. As such, it is unable to manage non-optical devices such as numerical keypads or radio frequency identification (RFID) readers. From a reuse perspective, it is better to abstract the actual reading mechanism and rename the scanning operation to a reading operation. After all, which mechanism the hardware device uses to read the item code should be irrelevant to the system. You should also rename the contract to IReaderAccess and ensure there is nothing in the contract’s design that precludes all types of code readers from reusing the contract. For example, the AdjustBeam() operation is meaningless for a keypad. It is better to break up the original the IScannerAccess into two contracts, and factor down the offending operation:
接口IReaderAccess
{
长读取代码();
无效打开端口();
无效关闭端口();
}
接口 IScannerAccess : IReaderAccess
{
无效调整光束();
}interface IReaderAccess
{
long ReadCode();
void OpenPort();
void ClosePort();
}
interface IScannerAccess : IReaderAccess
{
void AdjustBeam();
}
这样就可以正确地重用IReaderAccess:
This enables now proper reuse of IReaderAccess:
条形码扫描仪类:IScannerAccess
{...}
QRCodeScanner 类:IScannerAccess
{...}
KeypadReader 类:IReaderAccess
{...}
RFID阅读器类:IReaderAccess
{...}class BarcodeScanner : IScannerAccess
{...}
class QRCodeScanner : IScannerAccess
{...}
class KeypadReader : IReaderAccess
{...}
class RFIDReader : IReaderAccess
{...}
完成更改后,又过了一段时间,零售商决定让软件控制连接到销售点工作站的传送带。这需要软件启动和停止传送带,以及管理其通信端口。虽然传送带使用与读取设备相同类型的通信端口,但传送带无法重复使用,IReaderAccess因为合同不支持传送带,传送带无法读取代码。此外,还有一长串这样的外围设备,每个设备都有自己的功能,每引入一个都会重复其他合同的部分内容。
With that change done, more time passes, and the retailer decides to have the software also control the conveyer belt attached to the point-of-sale workstation. This requires the software to start and stop the belt, as well as manage its communication port. While the conveyer belt uses the same kind of communication port as the reading devices, the belt cannot reuse IReaderAccess because the contract does not support a conveyer belt, and the belt cannot read codes. Furthermore, there is a long list of such peripheral devices, each with its own functionality, and the introduction of every one of them will duplicate parts of the other contracts.
请注意,业务领域的每一次变化都会导致系统领域的相应变化。这是糟糕设计的标志。良好的系统设计应该能够适应业务领域的变化。
Observe that every change in the business domain leads to a reflected change in the system’s domain. This is the hallmark of a bad design. A good system design should be resilient to changes in the business domain.
根本问题是IReaderAccess是一个设计不良的合约。尽管所有操作都是读取器应该支持的操作,但ReadCode()它在逻辑上与OpenPort()和并不相关ClosePort()。读取操作涉及设备的一个方面,即作为代码提供者,这对零售商的业务至关重要(它是一个原子业务操作),而端口管理涉及与作为通信设备的实体相关的另一个方面。在这方面, 在IReaderAccess逻辑上不一致:它只是服务的所有要求的杂锦。更像图 B-1IReaderAccess中的设计。
The root problem is that IReaderAccess is a poorly designed contract. Even though all the operations are things a reader should support, ReadCode() is not logically related to OpenPort() and ClosePort(). The reading operation involves one facet of the device, as a provider of codes, something that is essential to the business of the retailer (it is an atomic business operation), while the port management involves a different facet relating to the entity as communication device. In this regard, IReaderAccess is not logically consistent: It is a mere grab-bag of every requirement for the service. IReaderAccess is more like the design in Figure B-1 than anything else.
OpenPort()一个更好的方法是将和操作分解ClosePort()到一个名为的单独合约中ICommunicationDevice:
A better approach is to factor sideways the OpenPort() and ClosePort() operations to a separate contract called ICommunicationDevice:
接口 ICommunicationDevice
{
无效打开端口();
无效关闭端口();
}
接口IReaderAccess
{
长读取代码();
}interface ICommunicationDevice
{
void OpenPort();
void ClosePort();
}
interface IReaderAccess
{
long ReadCode();
}
实施服务必须支持这两个合同:
The implementing services will have to support both contracts:
条形码扫描仪类:IScannerAccess、ICommunicationDevice
{...}class BarcodeScanner : IScannerAccess,ICommunicationDevice
{...}
请注意,里面的工作量BarcodeScanner与原始的完全相同IScannerAccess。但是,由于通信方面独立于读取方面,其他实体(例如皮带)可以重用ICommunicationDevice服务契约并支持它:
Note that the sum of work inside BarcodeScanner is exactly the same as with the original IScannerAccess. However, because the communication facet is independent of the reading facet, other entities (such as belts) can reuse the ICommunicationDevice service contract and support it:
接口IBeltAccess
{
无效开始();
无效停止();
}
传送带类:IBeltAccess、ICommunicationDevice
{...}interface IBeltAccess
{
void Start();
void Stop();
}
class ConveyerBelt : IBeltAccess,ICommunicationDevice
{...}
这种设计允许您将设备的通信管理方面与实际设备类型(条形码阅读器或传送带)分离。
This design allows you to decouple the communication–management aspect of the devices from the actual device type (be it barcode readers or conveyer belts).
销售点系统的真正问题不是读取设备的具体细节,而是连接到系统的设备类型的波动性。您的架构应该依赖于基于波动性的分解。正如这个简单示例所示,该原则也延伸到单个服务的合同设计。
The real issue with the point-of-sale system was not the specifics of the reading devices, but rather the volatility of the type of devices connected to the system. Your architecture should rely on volatility-based decomposition. As this simple example shows, the principle extends to the contract design of individual services as well.
当合约中的操作之间存在弱逻辑关系时,通常需要将操作分解到单独的合约中(例如ICommunicationDevice) 。IReaderAccess
Factoring operations into separate contracts (like ICommunicationDevice out of IReaderAccess) is usually called for whenever there is a weak logical relation between the operations in the contract.
有时,在几个不相关的合同中会发现相同的操作,这些操作在逻辑上与它们各自的合同相关。不包括它们会使合同的凝聚力降低。例如,假设出于安全原因,系统必须立即中止所有设备。此外,所有设备都必须支持某种诊断,以确保它们在安全范围内运行。从逻辑上讲,中止与读取一样是扫描仪操作,与启动或停止一样是皮带操作。
Sometimes identical operations are found in several unrelated contracts, and these operations are logically related to their respective contracts. Not including them would make the contract less cohesive. For example, suppose that for safety reasons, the system must immediately abort all devices. In addition, all devices must support some kind of diagnostics that assures they operate within safe limits. Logically, aborting is just as much a scanner operation as reading, and just as much a belt operation as starting or stopping.
在这种情况下,你可以将服务合同分解为一个合同层次结构,而不是单独的合同:
In such cases, you can factor the service contracts up, into a hierarchy of contracts instead of separate contracts:
接口IDeviceControl
{
无效中止();
长运行诊断();
}
接口 IReaderAccess : IDeviceControl
{...}
接口 IBeltAccess : IDeviceControl
{...}interface IDeviceControl
{
void Abort();
long RunDiagnostics();
}
interface IReaderAccess : IDeviceControl
{...}
interface IBeltAccess : IDeviceControl
{...}
三种契约设计技术(分解为派生契约、横向分解为新契约或分解为基本契约)可产生经过微调、更小且更可重用的契约。拥有更多可重用的契约当然是有好处的,而且在开始时契约臃肿时,较小的契约是必要的。但好事过头就是坏事。风险在于,如果你继续这样做,最终会得到过于细化和碎片化的契约,如图B-2所示。因此,你需要平衡两种相反的力量:实施服务契约的成本与将它们组合在一起的成本。取得平衡的方法是使用设计指标。
The three contract design techniques (factoring down to a derived contract, factoring sideways to a new contract, or factoring up to a base contract) result in fine-tuned, smaller, and more reusable contracts. Having more reusable contracts is certainly a benefit, and the smaller contracts are necessary when starting with bloated contracts. But too much of a good thing is a bad thing. The risk is that you keep doing this until eventually you end up with contracts that are too granular and fragmented, as in Figure B-2. You therefore need to balance the two opposing forces: the cost of implementing the service contracts versus the cost of putting them together. The way to strike the balance is to use design metrics.
可以衡量契约,并按从最差到最好的顺序排列它们。例如,你可以衡量代码的圈复杂度。你不太可能对大型复杂契约进行简单的实现,而过于精细的契约的复杂性将是可怕的。你可以衡量与底层服务相关的缺陷:低质量的服务可能是糟糕契约复杂性的结果。你可以衡量每个契约在系统中被重用的次数,以及契约被签出和更改的次数:显然,在任何地方都被重用且从未改变的契约是一份好契约。你可以为这些测量分配权重并对结果进行排序。多年来,我已经在不同的技术堆栈、系统、行业和团队中进行过这样的测量。尽管存在这种多样性,但已经出现了一些统一的指标,这些指标对于衡量契约的质量很有价值。
It is possible to measure contracts and rank them from worst to best. For example, you could measure the cyclomatic complexly of the code. You are unlikely to have a simple implementation of a large complex contract, and the complexity of overly granular contracts would be horrendous. You could measure the defects associated with the underlying services: Low-quality services are likely the result of the complexity of poor contracts. You could measure how many times each contract is reused in the system, and how many times a contract was checked out and changed: Clearly a contract that is reused everywhere and has never changed is a good contract. You could assign weights to these measurements and rank the results. I have conducted such measurements for years across different technology stacks, systems, industries, and teams. Regardless of this diversity, some uniform metrics have emerged that are valuable in gauging the quality of contracts.
仅包含一个操作的服务契约是可行的,但您应避免使用它们。服务契约是实体的一个方面,如果您仅用一个操作来表达它,那么这个方面一定非常枯燥。检查该单个操作并问自己一些问题。它是否使用了太多参数?它是否太粗糙,以至于您应该将单个操作分解为多个操作?您是否应该将此操作分解到现有的服务契约中?它是否最好位于要构建的下一个子系统中?我无法告诉您应该采取哪些纠正措施,但我可以告诉您,仅包含一个操作的契约是一个危险信号,您必须进一步调查。
Service contracts with just one operation are possible, but you should avoid them. A service contract is a facet of an entity, and that facet must be pretty dull if you can express it with just one operation. Examine that single operation and ask yourself some questions about it. Does it use too many parameters? Is it too coarse, so that you should factor the single operation into several operations? Should you factor this operation into an existing service contract? Is it something that should best reside in the next subsystem to be built? I cannot tell you which corrective action to take, but I can tell you that a contract with just one operation is a red flag, and you must investigate it further.
服务契约操作的最佳数量在 3 到 5 之间。如果您设计的服务契约包含更多操作,比如 6 到 9 个,那么您仍然做得相对较好,但是您已经开始偏离图 B-4中的最低成本区域。查看这些操作并确定是否可以将任何操作合并到其他操作中,因为很有可能过度分解操作。如果服务契约包含 12 个或更多操作,则很可能是设计不当。您应该寻找将操作分解为单独的服务契约或契约层次结构的方法。您必须立即拒绝包含 20 个或更多操作的契约,因为不可能存在此类契约是良性的。这样的契约肯定会掩盖一些严重的设计错误。您必须对大型契约尽量不要容忍,因为它们对开发和维护成本的影响是非线性的。
The optimal number of service contract operations is between 3 and 5. If you design a service contract with more operations, perhaps 6 to 9, you are still doing relatively well, but you have started to drift away from the area of minimum cost in Figure B-4. Take a look at the operations and determine whether any can be collapsed into other operations, since it is quite possible to over-factor operations. If the service contract has 12 or more operations, it is very likely a poor design. You should look for ways to factor the operations into either separate service contracts or a hierarchy of contracts. You must immediately reject contracts with 20 operations or more, as there are no possible circumstances where such contracts are benign. Such a contract is certain to plaster over some grave design mistake. You must have little tolerance for large contracts because of their nonlinear effects on development and maintenance costs.
有趣的是,在人类世界中,你总是使用合同大小指标来评估合同的质量。例如,你会签署只有一句话的雇佣合同吗?你会拒绝这份合同,因为一句话(甚至一个段落)不可能涵盖你作为雇员的所有方面。这样的合同肯定会遗漏责任或终止等关键细节,并且可能包含你不熟悉的其他合同。另一个极端是,你会签署一份包含 2000 页的雇佣合同吗?无论它承诺了什么,你甚至都不会费心去阅读它。即使是 20 页的合同也会引起担忧:如果雇佣性质需要这么多页,那么合同可能会很繁琐和复杂。但如果合同有 3-5 页,你可能不会签署,但你会仔细阅读。从重用的角度来看,请注意,雇主可能会为你提供与其他所有员工相同的合同。除了完全重用之外的任何事情都会令人担忧。
Interestingly, in the human world you always use contract size metrics to assess the quality of a contract. For example, would you sign an employment contract that has just one sentence? You would decline this contract because there is no way that a single sentence (or even a single paragraph) could capture all the aspects of you as an employee. Such a contract is certain to leaves out crucial details such as liability or termination and may incorporate other contracts with which you are unfamiliar. On the other extreme, would you sign an employment contract containing 2000 pages? You would not even bother to read it, regardless of what it promises. Even a 20-page contract is cause for concern: If the nature of the employment requires so many pages, the contract is likely taxing and complex. But if the contract has 3–5 pages, you may not sign it, but you will read it carefully. From a reuse perspective, note that the employer will likely furnish you with the same contract as all other employees have. Anything other than total reuse would be alarming.
许多服务开发堆栈故意在契约定义中不包含属性语义,但您可以通过创建类似属性的操作轻松规避这些语义,如下所示:
Many service development stacks deliberately do not have property semantics in contract definitions, but you can easily circumvent those by creating property-like operations, such as the following:
字符串获取名称(); 字符串设置名称();
string GetName(); string SetName();
在服务契约的上下文中,避免使用属性和类似属性的操作。属性暗示了状态和实现细节。当服务公开属性时,客户端知道这些细节,并且当服务更改时,客户端(或客户端)也会随之更改。您不应该让客户端为属性的使用或甚至对属性的了解而烦恼。良好的服务契约允许客户端调用抽象操作而不关心实际实现。客户端只需调用操作,让服务担心如何管理其状态。
In the context of service contracts, avoid properties and property-like operations. Properties imply state and implementation details. When the service exposes properties, the client knows about such details, and when the service changes the client (or clients) would change along with it. You should not bother clients with the use of properties or even the knowledge of them. Good service contracts allow clients to invoke abstract operations without caring about the actual implementation. The clients simply invoke operations and let the service worry about how to manage its state.
服务提供者和服务消费者之间的良好交互始终是行为上的。这种交互应以 的形式表达DoSomething(),例如Abort()。服务如何做到这一点不应该是客户关心的问题。这也模仿了现实生活:说总比问好。
A good interaction between a service provider and a service consumer is always behavioral. That interaction should be phrased in terms of DoSomething(), such as Abort(). How the service goes about doing that should be of no concern to the client. This, too, mimics real life: It is always better to tell than to ask.
在任何分布式系统中,避免使用属性也是一种很好的做法。最好将数据保存在数据所在的位置,并且只对其调用操作。
Avoiding properties is also a good practice in any distributed system. It is always preferable to keep the data where the data is, and only invoke operations on it.
一个服务不应该支持超过一两个契约。由于契约是服务的独立方面,如果服务支持三个或更多独立的业务方面,则表明该服务可能太大。
A service should not support more than one or two contracts. Since contracts are independent facets of the service, if the service supports three or more independent business aspects, it suggests the service may be too big.
有趣的是,你可以使用第 7 章的估算技术得出每项服务的合同数量。如果只使用数量级,每项服务的合同数量应该是 1、10、100 还是 1000?显然,100 或 1000 份合同是一种糟糕的设计,甚至 10 份合同似乎也非常多。因此,按数量级计算,每项服务的合同数量为 1。使用“2 的因子”技术,你可以进一步缩小范围:合同数量更像 1、2 还是 4?它不可能是 8,因为这几乎是 10,这已经被排除了。因此,每项服务的合同数量在 1 到 4 之间。这仍然是一个很大的范围。为了减少不确定性,你可以使用 PERT 技术,其中 1 作为最低估计,4 作为最高估计,2 为可能的数字。PERT 计算得出每项服务的合同数量为 2.2:
Interestingly, you can derive the number of contracts per service using the estimation techniques of Chapter 7. Using only orders of magnitude, should the number of contracts per service be 1, 10, 100, or 1000? Clearly, 100 or 1000 contracts is a poor design, and even 10 contracts seems very large. So, in order of magnitude, the number of contracts per service is 1. Using the “factor of 2” technique, you can narrow the range further: Is the number of contracts more like 1, 2, or 4? It cannot be 8 because that is almost 10, which is already ruled out. So the number of contracts per service is between 1 and 4. This is still a wide range. To reduce the uncertainty, you can use the PERT technique, with 1 as the lowest estimation, 4 as the highest, and 2 as the likely number. The PERT calculation yields 2.2 as the number of contracts per service:
实际上,在设计良好的系统中,我研究过的大多数服务只有一两个契约,最常见的情况是只有一个契约。在拥有两个或更多契约的服务中,额外的契约几乎总是与业务无关的契约,它们涵盖了安全性、持久性或检测等方面,并且这些契约在其他服务中被重用。
In practice, in well-designed systems, the majority of services I have examined had only one or two contracts, with a single contract as the more common case. Of the services with two or more contracts, the additional contracts were almost always non-business-related contracts that captured aspects such as security, safety, persistence, or instrumentation, and those contracts were reused across other services.
服务契约设计指标是评估工具,而不是验证工具。符合指标并不意味着您的设计良好,但违反指标则意味着您的设计糟糕。例如,考虑 的第一个版本IScannerAccess。该服务契约有 4 个操作,正好处于 3 到 5 个操作指标范围的中间,但契约在逻辑上不一致。
The service contract design metrics are evaluation tools, not validation tools. Complying with the metrics does not mean you have a good design—but violating the metric implies you have a bad design. As an example, consider the first version of IScannerAccess. That service contract has 4 operations, right in the middle of the range of the 3 to 5 operations metric, yet the contract was logically inconsistent.
避免试图设计以满足指标。与任何设计任务一样,服务契约设计本质上是迭代的。花必要的时间来确定您的服务应该公开的可重用契约,不要担心指标。如果您违反了指标,请继续努力,直到您拥有合适的契约。继续检查不断发展的契约,看看它们是否可以跨系统和项目重用。问问自己,这些契约在逻辑上是否一致、有凝聚力且独立。一旦您设计了这样的契约,您就会发现它们符合指标。
Avoid trying to design to meet the metrics. Like any design task, service contract design is iterative in nature. Spend the time necessary to identify the reusable contract your service should expose, and do not worry about the metrics. If you violate the metrics, keep working until you have decent contracts. Keep examining the evolving contracts to see if they are reusable across systems and projects. Ask yourself if the contracts are logically consistent, cohesive, and independent facets. Once you have devised such contracts, you will see that they match the metrics.
本附录中讨论的理念和技术直截了当、不言而喻且简单易懂。设计合同是一项后天习得的技能,通过练习可以快速正确地完成它。然而,“简单”和“过于简单”之间有很大区别。虽然本附录中的想法很简单,但它们远非过于简单。事实上,生活中充满了简单但不过于简单的想法。例如,你可能希望自己健康。这是一个简单的想法,可能涉及改变你的饮食、生活方式、日常生活甚至工作——这些都不是过于简单。
The ideas and techniques discussed in this appendix are straightforward, self-evident, and simple. Designing contracts is an acquired skill, and practice goes a long way toward getting it done quickly and correctly. However, there is a big difference between “simple” and “simplistic.” While the ideas in this appendix are simple, they are far from simplistic. Indeed, life is full of ideas that are simple but not simplistic. For example, you may wish to be healthy. That is a simple idea that may involve changes to your diet, lifestyle, daily routine, and even work—none of which is simplistic.
制定可重用服务契约是一项耗时且需要深思熟虑的任务。制定正确的契约绝对是至关重要的,否则您将面临非线性的更糟糕的问题(见图B-4)。真正的挑战不是设计契约(这很简单),而是获得管理层的支持。大多数经理都没有意识到错误的契约设计会带来什么后果。如果仓促实施,他们会导致项目失败。初级人员交接尤其如此(见第 14 章)。
Coming up with reusable service contracts is a time-consuming, highly contemplative task. It is absolutely paramount to get the contracts right, or you will face a nonlinearly worse problem (see Figure B-4). The real challenge is not designing the contracts (which is simple enough), but rather getting management’s support to do so. Most managers are unaware of the consequences of incorrect contract design. By rushing to implementation, they will cause the project to fail. This is especially the case with a junior hand-off (see Chapter 14).
即使是高级开发人员也可能需要指导才能正确设计契约,而你作为架构师,可以指导和培训他们。这将使你能够将契约设计作为每个服务生命周期的一部分。对于初级团队,你不能相信开发人员会提出正确的可重用契约;最有可能的是,他们会提出类似于图 B-1或图 B-2 的服务契约。你必须使用第 14 章的方法,在工作开始前分配时间来设计契约,或者最好使用一些高级熟练的开发人员在当前服务集的构建活动的同时设计下一组服务的契约(参见图 14-6 )。你应该使用本附录和图 B-4的概念来教育你的经理如何真正交付设计良好的服务。
Even senior developers may require mentorship to be able to design contracts correctly, and you, as the architect, can guide and train them. This will enable you to make the contract design part of each service life cycle. With a junior team, you cannot trust the developers to come up with correct reusable contracts; most likely, they will come up with service contracts resembling Figure B-1 or Figure B-2. You must use the approach of Chapter 14 to either carve up the time to design the contracts before work begins or, preferably, use a few senior skilled developers to design the contracts of the next set of services in parallel to the construction activities for the current set of services (see Figure 14-6). You should use the concepts of this appendix and Figure B-4 to educate your manager on what it really takes to ship well-designed services.
本书中的概念简单且在内部和与其他所有工程学科都一致。然而,一开始要接受这种关于系统和项目设计的新思维方式可能会让人不知所措。随着时间的推移和实践,应用这些想法将成为第二天性。为了便于吸收所有这些内容,本附录提供了一个简明的设计标准。设计标准将本书中的所有设计规则以清单的形式集中在一处。清单本身意义不大,因为您仍然必须了解每一项的上下文。然而,参考标准可以确保您不会遗漏重要的属性或考虑因素。这使得该标准成为成功的系统和项目设计所必需的,因为它可以帮助您实施最佳实践并避免陷阱。
The ideas in this book are simple and consistent both internally and with every other engineering discipline. However, it can be overwhelming at first to come to terms with this new way of thinking about system and project design. Over time and with practice, applying these ideas becomes second nature. To facilitate absorbing them all, this appendix offers a concise design standard. The design standard captures all the design rules from this book in one place as a checklist. The list on its own will not mean much, because you still have to know the context for each item. Nevertheless, referring to the standard can ensure that you do not omit an important attribute or consideration. This makes the standard essential for successful system and project design by helping you enforce the best practices and avoid the pitfalls.
标准包含两种类型的项目:指令和指南。指令是您永远不应违反的规则,因为违反指令必定会导致项目失败。指南是您应该遵循的建议,除非您有强烈且不寻常的理由违反它。仅仅违反指南并不一定会导致项目失败,但违反太多次将导致项目失败。如果您遵守指令,您也不太可能有任何理由违反指南。
The standard contains two types of items: directives and guidelines. A directive is a rule that you should never violate, since doing so is certain to cause the project to fail. A guideline is a piece of advice that you should follow unless you have a strong and unusual justification for going against it. Violating a guideline alone is not certain to cause the project to fail, but too many violations will tip the project into failure. It is also unlikely that if you abide by the directives that you will have any reason to go against the guidelines.
永远不要违背要求进行设计。
Never design against the requirements.
避免功能分解。
Avoid functional decomposition.
根据波动性分解。
Decompose based on volatility.
提供可组合的设计。
Provide a composable design.
提供功能作为集成而非实施的一个方面。
Offer features as aspects of integration, not implementation.
迭代设计,逐步构建。
Design iteratively, build incrementally.
设计项目以构建系统。
Design the project to build the system.
利用根据时间表、成本和风险而不同的可行选项来做出明智的决策。
Drive educated decisions with viable options that differ by schedule, cost, and risk.
沿着关键路径构建项目。
Build the project along its critical path.
整个项目都要按时完成。
Be on time throughout the project.
要求
捕获所需的行为,而不是所需的功能。
使用用例描述所需的行为。
使用活动图记录所有包含嵌套条件的用例。
消除伪装成需求的解决方案。
通过确保系统设计支持所有核心用例来验证系统设计。
Requirements
Capture required behavior, not required functionality.
Describe required behavior with use cases.
Document all use cases that contain nested conditions with activity diagrams.
Eliminate solutions masquerading as requirements.
Validate the system design by ensuring it supports all core use cases.
基数
避免在没有子系统的系统中设置超过 5 个管理器。
避免使用过多的子系统。
避免每个子系统有超过三个管理器。
努力实现发动机与管理器的黄金比例。
如果有必要,允许ResourceAccess组件访问多个资源。
Cardinality
Avoid more than five Managers in a system without subsystems.
Avoid more than a handful of subsystems.
Avoid more than three Managers per subsystem.
Strive for a golden ratio of Engines to Managers.
Allow ResourceAccess components to access more than one Resource if necessary.
属性
波动性应该自上而下地降低。
重用应自上而下地增加。
不要封装业务性质的改变。
管理人员几乎应该是可有可无的。
设计应该是对称的。
切勿使用公共通信渠道进行内部系统交互。
Attributes
Volatility should decrease top-down.
Reuse should increase top-down.
Do not encapsulate changes to the nature of the business.
Managers should be almost expendable.
Design should be symmetric.
Never use a public communication channels for internal system interactions.
图层
避免开放式建筑。
避免半封闭/半开放式建筑。
更喜欢封闭式架构。
请勿打电话。
请勿横向通话(经理之间的排队通话除外)。
请勿向下调用超过一层。
通过使用排队调用或异步事件发布来解决开放架构的尝试。
通过实现子系统来扩展系统。
Layers
Avoid open architecture.
Avoid semi-closed/semi-open architecture.
Prefer a closed architecture.
Do not call up.
Do not call sideways (except queued calls between Managers).
Do not call more than one layer down.
Resolve attempts at opening the architecture by using queued calls or asynchronous event publishing.
Extend the system by implementing subsystems.
交互规则
所有组件都可以调用实用程序。
管理器和引擎可以调用ResourceAccess。
管理人员可以致电引擎。
经理可以将呼叫排队至另一位经理。
Interaction rules
All components can call Utilities.
Managers and Engines can call ResourceAccess.
Managers can call Engines.
Managers can queue calls to another Manager.
互动禁忌
客户端不会在同一个用例中调用多个管理器。
在同一用例中,管理器不会将呼叫排队到多个管理器。
引擎不接收排队呼叫。
ResourceAccess组件不接收排队调用。
客户端不发布事件。
引擎不发布事件。
ResourceAccess组件不发布事件。
资源不发布事件。
Engines、ResourceAccess和Resources不订阅事件。
Interaction don’ts
Clients do not call multiple Managers in the same use case.
Managers do not queue calls to more than one Manager in the same use case.
Engines do not receive queued calls.
ResourceAccess components do not receive queued calls.
Clients do not publish events.
Engines do not publish events.
ResourceAccess components do not publish events.
Resources do not publish events.
Engines, ResourceAccess, and Resources do not subscribe to events.
一般的
不要设计时钟。
永远不要设计一个没有能够囊括波动性的架构的项目。
捕获并验证规划假设。
遵循项目设计的设计。
为项目设计几种方案;至少设计正常、压缩和亚临界解决方案。
以 Optionality 的方式与管理层沟通。
在主要工作开始之前务必经过 SDP 审查。
General
Do not design a clock.
Never design a project without an architecture that encapsulates the volatilities.
Capture and verify planning assumptions.
Follow the design of project design.
Design several options for the project; at a minimum, design normal, compressed, and subcritical solutions.
Communicate with management in Optionality.
Always go through SDP review before the main work starts.
人员配备
避免多个建筑师。
一开始就建立一个核心团队。
仅要求配备沿着关键路径畅通无阻地前进所需的最低级别的人员。
始终根据浮动时间分配资源。
确保正确的人员分配。
确保计划挣值的S曲线呈平缓。
始终按照 1:1 的比例将组件分配给开发人员。
争取任务的连续性。
Staffing
Avoid multiple architects.
Have a core team in place at the beginning.
Ask for only the lowest level of staffing required to progress unimpeded along the critical path.
Always assign resources based on float.
Ensure correct staffing distribution.
Ensure a shallow S curve for the planned earned value.
Always assign components to developers in a 1:1 ratio.
Strive for task continuity.
一体化
避免大量的集成点。
避免在项目结束时进行集成。
Integration
Avoid mass integration points.
Avoid integration at the end of the project.
估计
不要高估。
不要低估。
追求准确,而不是精确。
在任何活动评估中始终使用五天的时间量。
对整个项目进行评估以验证甚至启动您的项目设计。
减少估计的不确定性。
在需要时,保持正确的估计对话。
Estimations
Do not overestimate.
Do not underestimate.
Strive for accuracy, not precision.
Always use a quantum of five days in any activity estimation.
Estimate the project as a whole to validate or even initiate your project design.
Reduce estimation uncertainty.
When required, maintain correct estimation dialog.
项目网络
将资源依赖性视为依赖项。
验证所有活动均位于以关键路径开始和结束的链上。
确认所有活动均已分配资源。
避免使用节点图。
更喜欢箭头图。
避免神活动。
将大型项目分解成网络网络。
将近乎关键的链视为关键链。
力争将圈复杂度降低至 10 到 12。
分层设计以降低复杂性。
Project network
Treat resource dependencies as dependencies.
Verify all activities reside on a chain that starts and ends on a critical path.
Verify all activities have a resource assigned to them.
Avoid node diagrams.
Prefer arrow diagrams.
Avoid god activities.
Break large projects into a network of networks.
Treat near-critical chains as critical chains.
Strive for cyclomatic complexity as low as 10 to 12.
Design by layers to reduce complexity.
时间和成本
首先通过快速、干净的做法而不是压缩来加速项目。
永远不要承诺处于死亡地带的项目。
使用并行工作而不是顶级资源进行压缩。
谨慎、明智地使用顶级资源进行压缩。
避免压缩率高于 30%。
避免效率高于25%的项目。
即使采取任何压缩选项的可能性很低,也要压缩项目。
Time and cost
Accelerate the project first by quick and clean practices rather than compression.
Never commit to a project in the death zone.
Compress with parallel work rather than top resources.
Compress with top resources carefully and judiciously.
Avoid compression higher than 30%.
Avoid projects with efficiency higher than 25%.
Compress the project even if the likelihood of pursuing any of the compressed options is low.
风险
定制您的项目的关键性风险范围。
利用活动风险调整浮动异常值。
将正常解决方案减压至风险曲线上的临界点。
目标减压至0.5风险。
比起具体的风险数字,风险临界点的评价更为重要。
不要过度减压。
解压分层设计的解决方案,或许可以采取积极的方式。
保持正常解决方案的风险低于 0.7。
避免风险低于 0.3。
避免风险高于 0.75。
避开选择风险更大或者比风险更安全的项目的交叉点。
Risk
Customize the ranges of criticality risk to your project.
Adjust floats outliers with activity risk.
Decompress the normal solution past the tipping point on the risk curve.
Target decompression to 0.5 risk.
Value the risk tipping point more than a specific risk number.
Do not over-decompress.
Decompress design-by-layers solutions, perhaps aggressively so.
Keep normal solutions at less than 0.7 risk.
Avoid risk lower than 0.3.
Avoid risk higher than 0.75.
Avoid project options risker or safer than the risk crossover points.
对活动的内部阶段采用二元退出标准。
Adopt binary exit criteria for internal phases of an activity.
为所有活动分配一致的阶段权重。
Assign consistent phase weights across all activities.
每周跟踪进度和努力。
Track progress and effort on a weekly basis.
切勿以功能为依据来报告进度。
Never base your progress reports on features.
始终根据集成点来报告进度。
Always base your progress reports on integration points.
跟踪近乎关键的链的浮动。
Track the float of near-critical chains.
设计可重复使用的服务契约。
Design reusable service contracts.
遵守服务合同设计指标
避免签订单一操作的合同。
争取每个服务合同有 3 到 5 个操作。
避免签订涉及超过 12 个操作的服务合同。
拒绝包含 20 个或更多操作的服务合同。
Comply with service contract design metrics
Avoid contracts with a single operation.
Strive to have 3 to 5 operations per service contract.
Avoid service contracts with more than 12 operations.
Reject service contracts with 20 or more operations.
避免类似属性的操作。
Avoid property-like operations.
将每个服务的合同数量限制为 1 或 2 个。
Limit the number of contracts per service to 1 or 2.
避免将工作移交给下级。
Avoid junior hand-offs.
仅让架构师或有能力的高级开发人员设计合同。
Have only the architect or competent senior developers design the contracts.
Absolute criticality, color coding floats, 203–204, 267
Abstract structural dependencies. See also Activity dependencies, 339
纠正措施。请参阅 纠正措施
Actions, corrective. See Corrective actions
Activities. See also Activity dependencies; Scheduling activities
活动网络。请参阅 项目网络
activity network. See Project network
color-coding for risk, 240–243
creating a list of, 158, 260–261
活动跟踪工作量公式392
formula for effort of an activity for tracking, 392
上帝。另 请参阅上帝服务,15,30,68,307-308
god. See also God service, 15, 30, 68, 307–308
life cycle and project tracking, 388–392
quality-assurance activities, 381–382
quality-control activities, 380–381
TradeMe project design case study, 346–347
什么是活动,195
what is an activity, 195
活动依赖关系。另请参阅 项目网络
Activity dependencies. See also Project network
arrow versus node diagrams, 197–198
critical path analysis, 168–169
虚拟活动,197
dummy activities, 197
和里程碑,262
and milestones, 262
在网络图中,195
in the network diagram, 195
节点图/箭头图,196
the node diagram/arrow diagram, 196
项目设计,TradeMe 项目设计案例研究,339 – 341
project design, TradeMe project design case study, 339–341
资源和176
resources and, 176
working in parallel, removing, 211–212
活动图。另请参阅 用例
Activity diagrams. See also Use cases
the business logic layer, 62–63
TradeMe 系统设计案例研究,105
TradeMe system design case study, 105
活动生命周期
Activity life cycle
活动风险
Activity risk
计算陷阱,247
calculation pitfall, 247
与临界风险相比,248
versus criticality risk, 248
geometric activity risk, 317–320
introduction to and formula, 245–246
project design, in action, 293–294
TradeMe 项目设计案例研究,356
TradeMe project design case study, 356
演员模型,面向未来的设计,121
Actor model, future-looking design, 121
阿莫斯·特沃斯基。参见 前景理论(卡尼曼、特沃斯基)
Amos Tversky. See Prospect theory (Kahneman, Tversky)
Analysis, avoiding paralysis in, 7–8
反设计努力
Anti-design effort
functional decomposition, 20–21
TradeMe system case study, 106–108
建筑师
Architects
importance of having a single, 147–148
职位描述,150
job description, 150
作用和责任,194
role and responsibility, 194
架构。请参阅 系统设计
Architecture. See System design
Arithmetic risk, versus geometric risk, 316–319
箭线图
Arrow diagram
versus the node diagrams, 197–198
项目设计,实际操作中,261 – 262 , 272 , 273 , 278 , 287 , 351
project design, in action, 261–262, 272, 273, 278, 287, 351
项目设计实际操作283
project design in action, 283
TradeMe project design case study, 342, 347
组装,说明,项目设计,141
Assembly, instructions, project design, 141
分配资源。请参阅 资源
Assigning resources. See Resources
原子商业动词,64、69、81、93、115、417
Atomic business verbs, 64, 69, 81, 93, 115, 417
原子商业动词,在命名中,66
Atomic business verbs, in naming, 66
波动轴。另请参阅 基于波动性的分解
Axes of volatility. See also Volatility-based decomposition
decomposition of a house, 39–40
波动率列表,42
volatilities list and, 42
坏习惯。参见 坏习惯的警笛声
Bad habits. See Siren song of bad habits
行为依赖关系。另请参阅 活动依赖关系
Behavioral dependencies. See also Activity dependencies
与价值观370
versus values, 370
行为,必需。另请参阅 用例
Behaviors, required. See also Use cases
business logic layer and implementing required, 61–62
分类指南, 65
classification guidelines, 65
不是价值观,风险临界点,370
not values, risk tipping point, 370
rather than required functionality, 56–57
最佳实践。另请参阅 设计标准
Best practices. See also Design standard
to accelerate projects, 207–210
关注的重要性,159
importance of following, 159
quality-assurance and, 381–382
风险指标, 253
risk metrics, 253
大项目。参见 大型项目
Big projects. See Large projects
宽带估计
Broadband estimation
在项目设计中,370
in design of project design, 370
在 TradeMe 项目设计案例研究中,338
in TradeMe project design case study, 338
Brooks, Frederick Dr., 8–9, 401
业务逻辑
Business logic
调用 ResourceAccess,78
calling ResourceAccess by, 78
in clients with functional decomposition, 15–16, 29
在开放式架构的客户端中,75
in clients with open architecture, 75
日历日期,安排活动,176
Calendar dates, scheduling activities, 176
调用链图。另请参阅 用例
Call chain diagrams. See also Use cases
architecture validation and, 87–88
并创建依赖项,339
and creating dependencies, 339
project design in action, 257–259
TradeMe system design case study, 124–135
需求方面的变化。请参阅 需求分析
Changes, in requirements. See Requirements analysis
Chimera,多个建筑师问题,148
Chimera, multiple architects issue, 148
分类指南
Classification guidelines
介绍,65
introduction to, 65
Managers-to-Engines ratio, 67–68
客户
Clients
bloat and coupling in functional decomposition, 15–17, 107
client designs and working in parallel, 278–279
compression with simulators, 284–285
infrastructure and client designs first, 280–283
整合过程, 146
integration process, 146
open architecture example, 75–76
打开架构,79
opening the architecture, 79
relevant design questions, 66–67
TradeMe 系统设计案例研究, 117 , 119 – 120
TradeMe system design case study, 117, 119–120
波动性自上而下降低,68
volatility decreases top-down, 68
TradeMe 系统设计中的波动性案例研究,114 – 115
volatility in TradeMe system design case study, 114–115
封闭式架构。另请参阅 开放式架构
Closed architecture. See also Open architecture
介绍,76
introduction to, 76
开幕,79
opening, 79
semi-closed architecture, 76–77
颜色编码
Color-coding
风险活动,240 – 243,271,316
activities by risk, 240–243, 271, 316
沟通
Communication
与开发人员和经理,8 – 9 , 55 , 65 , 68 , 198
with developers and managers, 8–9, 55, 65, 68, 198
设计标准中的可选性,428
optionality, in design standard, 428
通过网络图,198
through network diagrams, 198
Competitors, identifying volatilities, 50–51
复杂性、项目
Complexity, project
圈复杂度。请 参阅圈复杂度
cyclomatic. See Cyclomatic complexity
并行工作,213
parallel work, 213
reducing for TradeMe project, 340–341
用方法减少,76
reducing with The Method, 76
组件。请参阅 服务
Components. See Services
可组合设计
Composable design
定义,在复杂性方面,326
definition, in complexity, 326
定义,在 TradeMe 项目设计案例研究中,129
definition, in TradeMe project design case study, 129
设计标准425
in design standard, 425
组成。另请参阅 需求分析
Composition. See also Requirements analysis
介绍,83
introduction to, 83
requirements and changes, 83–85, 92–93
smallest set of components, 89–90
Compression, of schedule. See also Parallel, working in; Project network
compared with normal solution, 287–289
完整压缩解决方案, 216
full compression solution, 216
infrastructure and client designs first, 280–283
介绍,210
introduction to, 210
最大值,216,232,367,429
网络压缩, 231
network compression, 231
parallel work and cost, 213–214
风险和,237,248-249
拥有顶级资源,368
with top resources, 368
TradeMe project design case study, 346–350
修剪模糊的前端,369
trimming the fuzzy front end, 369
使用第二位顶级开发人员,278
using a second top developer, 278
using a top developer, 276–278
并行工作,211 – 213、278 – 280、322
working in parallel, 211–213, 278–280, 322
working with better resources, 210–211
连接波动,交易系统,43
Connection volatility, trading system, 43
连通性,圈复杂度,321
Connectivity, cyclomatic complexity, 321
约束,通过设计决策树,8
Constraints, through design decision tree, 8
建造
Construction
benefits of incremental, 70–72
层和334
layers and, 334
合同,设计服务合同。请参阅 服务合同设计
Contracts, designing service contracts. See Service contract design
Conventions, naming services, 65–66
康威定律(梅尔文·康威),反驳,331
Conway’s law (Melvin Conway), countering, 331
核心团队
Core team
成本要素, 229
cost elements, 229
设计模糊前端,151
designing the fuzzy front end, 151
initial project staffing, 149–150
and staffing distribution, 177–178
TradeMe项目,常规解决方案,341
TradeMe project, normal solution, 341
核心用例。另请参阅 用例
Core use cases. See also Use cases
diagrams supporting support of, 88–89
smallest set of components, 86–87, 89–90
TradeMe system design case study, 104–105
纠正措施
Corrective actions
Correlation models, time-cost, 291–292
Cost. See Direct cost; Project costs
Coupled client, functional decomposition, 15–17, 107
耦合设计,紧密与松散,156
Coupled design, tight vs. loose, 156
Coupled services, functional decomposition, 14, 17–19
关键路径分析。另请参阅 工作量估算
Critical path analysis. See also Effort estimations
compressing activities, 231–232
历史,199
history of, 199
网络图。请参阅 网络图
network diagrams. See Network diagrams
主动项目管理,204
proactive project management, 204
严重性风险。另请参阅 风险
Criticality risk. See also Risk
与活动风险相比,248
versus activity risk, 248
introduction to and formula, 241–244
指标, 253
metrics, 253
项目设计实际操作295
project design in action, 295
subcritical staffing, 272–274, 322
TradeMe 项目解决方案比较,355
TradeMe project solution comparison, 355
visualizing total floats by, 203–204
风险的交叉点
Crossover point, of risk
可接受的风险和设计选项,312
acceptable risk and design options, 312
顾客
Customers
interview using axes of volatility method, 37–38
产品经理与150 的关系
product managers relationship with, 150
圈复杂度
Cyclomatic complexity
设计标准429
in design standard, 429
设计网络的网络,329
designing a network of networks, 329
designing by layers to reduce, 333, 350
公式,321
formula for, 321
衡量服务合同,420
measuring service contracts, 420
problems with functional design, 16–17
项目类型和复杂性,322
project type and complexity, 322
截止日期
Deadlines
坚持计划的重要性,399
importance of staying on the plan, 399
高估,403
overestimating, 403
低估,400
underestimating, 400
Death zone, in time-cost curve, 218–220, 223
项目设计实际行动292
in project design in action, 292
Debriefing, project design, 378–379
Decision-making, educated decisions, 151–152
Decomposition. See also Domain decomposition; Functional decomposition; Volatility-based decomposition
介绍,13
introduction to, 13
maintenance and development and, 32–33
减压目标。另请参阅 风险减压
Decompression target. See also Risk decompression
推荐理想,301
recommended ideal, 301
汤姆·德马科,174
Demarco, Tom, 174
依赖项。请参阅 活动依赖项
Dependencies. See Activity dependencies
依赖关系图。另请参阅 网络图
Dependency chart. See also Network diagrams
project activity network, 167–168
project design in action, 258–260
设计、项目。请参阅 项目设计
Design, project. See Project design
设计标准
Design standard
指令,426
directives, 426
介绍,425
introduction to, 425
project design guidelines, 427–429
project tracking guidelines, 429–430
服务契约设计指南,430
service contract design guidelines, 430
system design guidelines, 426–427
设计、系统。请参阅 系统设计
Design, system. See System design
开发人员
Developers
assigning services to, 153–155
assigning to critical path activities, 172–173
design and team efficiency, 155–157
float-based assignment, 173, 205–206
设计沟通的重要性,8
importance of design communication, 8
junior versus senior developers, 210–211
junior versus senior project hand-off, 374–376
项目经理与149 的关系
project managers relationship with, 149
senior developers as junior architects, 376–377
培训和提高技能新技术,208 – 209,381 – 382
training and improving skills new technology, 208–209, 381–382
图表
Diagrams
活动。请参阅 活动图
activity. See Activity diagrams
网络。请参阅 网络图
network. See Network diagrams
直接成本
Direct cost
与间接成本相比,229
versus indirect cost, 229
介绍,222
introduction to, 222
和风险,241
and risk, 241
和风险减压,297
and risk decompression, 297
风险模型和最小值,301
risk models and minimum, 301
time-cost curve and minimum, 299–300
total, direct and indirect costs, 223–224
离散建模,时间成本曲线,217
Discrete modeling, time-cost curve, 217
域分解
Domain decomposition
building a domain house, 23–24
TradeMe 系统设计案例研究:反设计努力,108
TradeMe system design case study: anti-design effort, 108
虚拟活动, 197
Dummy activities, 197
邓宁-克鲁格效应,36
Dunning-Kruger effect, 36
项目持续时间。请参阅 项目持续时间
Duration, project. See Project duration
挣值规划
Earned value planning
discerning mistakes through, 189–190
项目设计实际操作中,266、271、274
in project design in action, 266, 271, 274
as a project design tool, 187–189
in throughput analysis, 287–289
for tracking with project progress, 393–394
TradeMe 项目压缩解决方案,348
TradeMe project compressed solution, 348
TradeMe项目正常解决方案,344
TradeMe project normal solution, 344
TradeMe 项目亚临界解决方案,353
TradeMe project subcritical solution, 353
Educated decisions, making, 151–152
效率。请参阅 项目效率
Efficiency. See Project efficiency
工作量估算。另请参阅 关键路径分析
Effort estimations. See also Critical path analysis
architecture versus estimations, 365–366
estimation techniques, 160–162
估算工具,163
estimation tools, 163
汇报的重要性,378
importance of debriefing, 378
overall project estimation, 162–165, 394
overestimating and corrective actions, 402–403
resource leak and corrective actions, 401–402
TradeMe project design case study, 335–338
underestimating and corrective actions, 400–401
Effort, versus scope calculations, 372–373
人员配置弹性、项目效率和186
Elasticity of staffing, project efficiency and, 186
Emulators development. See also Simulators, 212
引擎
Engines
the business logic layer, 62–63, 117
relevant design questions, 66–67
in TradeMe system design, 114–117
估算,工作量。请参阅 工作量估算
Estimations, effort. See Effort estimations
执行复杂度,由压缩导致。另请参阅 复杂度、项目;圈复杂度,213,228,249
Execution complexity, resulting from compression. See also Complexity, project; Cyclomatic complexity, 213, 228, 249
外部专家
Experts, external
提供访问权限,209
providing access to, 209
TradeMe 项目设计案例研究,342
TradeMe project design case study, 342
指数临界性,浮点数可视化,203
Exponential criticality, visualizing floats, 203
可扩展性,好处,72
Extensibility, benefits of, 72
外部服务协议, 74
External service protocols, 74
方面,契约为,412
Facets, contracts as, 412
保理合同
Factoring contracts
分解因数,419
factoring up, 419
可行性、时间成本曲线和。另请参阅 时间成本曲线中的死亡地带,218 – 220
Feasibility, time-cost curve and. See also Death zone, in time-cost curve, 218–220
盛宴或饥荒周期、人员分配,177
Feast or famine cycles, staffing distribution, 177
喂我/杀了我会议。另请参阅 有根据的决策、制定;SDP 审查,153
Feed Me/Kill Me meetings. See also Educated decisions, making; SDP Review, 153
斐波那契风险
Fibonacci risk
与临界性和活动风险相比,248
compared with criticality and activity risk, 248
formula for and introduction, 244–245
几何,317
geometric, 317
财务分析,项目,364
Financial analysis, project, 364
浮点数
Floats
分配资源, 173
assigning resources, 173
专用资源和275
dedicated resources and, 275
基于浮点的赋值,173,175,205-206
float-based assignment, 173, 175, 205–206
主动项目管理,204
proactive project management, 204
公式
Formulas
活动风险,246
activity risk, 246
临界风险, 242
criticality risk, 242
圈复杂度,321
cyclomatic complexity, 321
活动的努力,392
effort of an activity, 392
几何活动风险,318
geometric activity risk, 318
几何临界风险,316
geometric criticality risk, 316
几何斐波那契风险,317
geometric Fibonacci risk, 317
方法,4
The Method, 4
PERT(项目评估与审查技术),162
PERT (Program Evaluation and Review Technique), 162
计划挣值, 187
planned earned value, 187
活动进展,391
progress of an activity, 391
项目成本,184
project cost, 184
项目工作量, 394
project effort, 394
项目进度状态, 392
project progress status, 392
完成一项活动的时间,169
time for completing an activity, 169
总成本,223
total cost, 223
功能分解
Functional decomposition
避免在设计标准中,426
avoid, in design standard, 426
compared with volatility-based decomposition, 32–33
example of functional house, 21–22
example of functional trading system, 27–30
处理需求变更, 92
handling requirement changes, 92
physical versus software systems, 26–27
TradeMe 系统设计案例研究:反设计努力,106 – 108
TradeMe system design case study: anti-design effort, 106–108
模糊前端
Fuzzy Front End
压缩,369
compressing, 369
核心团队职责,151
core team responsibilities, 151
包含成本要素, 229
with cost elements, 229
收集假设,265
gathering assumptions, 265
引言,151
introduction, 151
GE(通用电气),时间成本曲线历史,225
GE (General Electric), time-cost curve history, 225
几何风险
Geometric risk
斐波那契风险,317
Fibonacci risk, 317
上帝活动。另见 几何风险;上帝服务,307 – 308,319 – 320,429
God activities. See also Geometric risk; God service, 307–308, 319–320, 429
上帝服务,15,30,68,106
黄金比例。另请参阅 P
Golden ratio. See also Phi
斐波那契风险,244
in Fibonacci risk, 244
指南,分类。请参阅 分类指南
Guidelines, classification. See Classification guidelines
交接、设计
Hand-off, design
采用合同设计,423
with contract design, 423
设计标准430
in design standard, 430
介绍,374
introduction to, 374
初级交接,375
junior hand-off, 375
senior developers as junior architects, 376–377
高级交接,375
senior hand-off, 375
Hierarchy of needs, project design, 141–144
历史记录、项目估算、163
Historical records, project estimation, 163
荷马的《奥德赛》。参见 坏习惯的警笛之歌
Homer’s Odyssey. See Siren song of bad habits
HTTP,微服务,74
HTTP, microservices, 74
设计
IDesign
客户网络绘图工具, 197
customer network drawing tool, 197
介绍,3
introduction to, 3
管理人员与发动机的比例为68
managers to engines ratio, 68
核心用例数量,86
number of core use cases, 86
TradeMe 项目设计案例研究,95
TradeMe project design case study, 95
TradeMe 系统设计案例研究,335
TradeMe system design case study, 335
间接成本
Indirect costs
与直接成本相比,229
versus direct cost, 229
和风险,228
and risk, 228
与总成本相比,230
versus total cost, 230
total, direct and indirect costs, 223–224
基础设施
Infrastructure
at the beginning of the project, 267–268, 341
and client designs first, 280–283
首先,资源有限,269
first, with limited resources, 269
投资,208
investing in, 208
继承的依赖项
Inherited dependencies
引言,258
introduction, 258
在 TradeMe 项目中,341
in TradeMe project, 341
交互规则。请参阅 设计禁忌
Interaction rules. See Design don’ts
初级建筑师
Junior architects
支持建筑师,149
supporting the architect, 149
初级开发人员
Junior developers
卡尼曼,丹尼尔。参见 前景理论(卡尼曼,特沃斯基)
Kahneman, Daniel. See Prospect theory (Kahneman, Tversky)
大型项目
Large projects
complex systems and fragility, 325–327
designing a network of networks, 328–331
分层设计
Layered design
introduction to, 58–59, 332–333
层和结构,334
layers and construction, 334
open and closed architectures, 75–79
重用,69
reuse, 69
TradeMe system design case study, 116–119
typical layers in The Method, 60–61
活动的生命周期
Life cycle, of activity
负载均衡,184
Load leveling, 184
地区波动
Locale volatility
TradeMe 系统设计案例研究,115
TradeMe system design case study, 115
交易系统示例,44
trading system example, 44
逻辑函数
Logistic function
压缩和复杂性,323
compression and complexity, 323
decompression target, 251–252, 313
概述, 192
overview of, 192
管理
Management
communicating in optionality to, 366–367
提出多个项目计划,153
presenting multiple project plans, 153
presenting projections, 404–405
经理、产品、职位描述、150
Manager, product, job description, 150
经理、项目、职位描述,149
Manager, project, job description, 149
经理
Managers
呼叫引擎, 78
calling Engines, 78
compression with simulators, 284–285
创建子系统,70
creating subsystems, 70
in TradeMe system design, 114–117
大型项目。参见 大型项目
Megaprojects. See Large projects
消息总线
Message bus
可扩展性和灵活性,109
for extensibility and flexibility, 109
在消息是(应用程序)中,120
in Message Is (application), 120
通知, 115
for notification, 115
在 TradeMe 项目设计案例研究中,在抽象依赖关系中,339
in TradeMe project design case study, in abstract dependencies, 339
在 TradeMe 项目设计案例研究中,在非结构性活动中,337
in TradeMe project design case study, in nonstructural activities, 337
在 TradeMe 项目设计案例研究中,在覆盖依赖项中,340 – 341
in TradeMe project design case study, in overriding dependencies, 340–341
在 TradeMe 项目设计案例研究中,在结构活动中,336
in TradeMe project design case study, in structural activities, 336
TradeMe 系统设计案例研究中的运营概念,119 – 121
in TradeMe system design case study operational concepts, 119–121
消息是(应用程序)
Message Is (application)
带有演员模型,121
with actor model, 121
在 TradeMe 系统设计案例研究中,131
in TradeMe system design case study, 131
梅特卡夫定律,326
Metcalfe’s law, 326
方法
The Method
classification guidelines, 65–70
eliminating analysis-paralysis, 7–8
时间紧迫,6
time crunch, 6
typical layers in The Method, 60–61
指标
Metrics
收集和分析,382
collecting and analyzing, 382
contract design metrics, 419–423
风险指标, 253
risk metrics, 253
微服务
Microservices
Microsoft Project, 170, 176, 202
里程碑
Milestones
箭头图和197
arrow diagrams and, 197
项目设计实际操作262
project design in action, 262
公共和私人里程碑,262
public and private milestones, 262
TradeMe 项目设计案例研究,341
TradeMe project design case study, 341
Minimum cost, services, 411, 412
最低直接成本
Minimum direct cost
in project design in action, 299–300, 301
在TradeMe 项目设计案例研究中,310、358--359
in TradeMe project design case study, 310, 358–-359
Minimum duration solution, time-cost curve, 216, 218
模型
Models
project design in action risk models, 300–302
project design in action time-cost model, 291–292
TradeMe 项目设计案例研究风险模型,356
TradeMe project design case study risk models, 356
TradeMe 项目设计案例研究时间成本模型,358 –-359
TradeMe project design case study time-cost models, 358–-359
模块。参见 服务
Modules. See Services
Naming conventions, services, 65–66
网络压缩。请参阅 压缩、进度
Network compression. See Compression, of schedule
网络图。另请参阅 关键路径分析
Network diagrams. See also Critical path analysis
arrow versus node diagrams, 197–198
引言,167
introduction, 167
节点图, 196
node diagram, 196
项目设计在行动中,261 – 262,271,273,278,283,287
project design in action, 261–262, 271, 273, 278, 283, 287
TradeMe 项目案例研究, 342 , 347 , 351
TradeMe project case study, 342, 347, 351
网络的网络
Network of networks
好处, 328
benefits of, 328
违反康威定律,331
countering Conway’s law, 331
网络、项目。请参阅 项目网络
Network, project. See Project network
节点图
Node diagram
versus the arrow diagram, 197–198
介绍,196
introduction to, 196
非行为依赖性,TradeMe 项目,340
Nonbehavioral dependencies, TradeMe project, 340
非编码活动,TradeMe 项目,338
Noncoding activities, TradeMe project, 338
Nonstructural coding activities, TradeMe project, 336–337
正常解决方案
Normal solution
减压,250
decompression, 250
介绍,215
introduction to, 215
and minimum direct cost, 251–252
and minimum total cost, 225–228
project design in action, 265–276
和风险曲线,238
and the risk curve, 238
风险度量指南, 253
risk metric guideline, 253
关于将系统分解为模块的标准(Parnas),34
On the Criteria to Be Used in Decomposing Systems into Modules (Parnas), 34
开放式架构。另请参阅 封闭式架构
Open architecture. See also Closed architecture
调用和横向调用的问题,107
issues of calling up and sideways, 107
Operational concepts, TradeMe project, 119–122, 340
Operational dependencies. See also Activity dependencies, 339–340
最佳项目设计点
Optimal project design point
存在风险,252
with risk, 252
项目设计在行动中,301
in project design in action, 301
in TradeMe project design case study, 359–360
Optionality, communication with management, 366–367
异常值浮动,调整
Outliers floats, adjusting
具有几何活动风险,318
with geometric activity risk, 318
项目设计实际操作295
project design in action, 295
在 TradeMe 案例研究中,356
in TradeMe case study, 356
并行生命周期,374
Parallel life cycles, 374
并行工作
Parallel, working in
compression with simulators, 284–285
infrastructure and client designs first, 280–283
每个服务有多个开发人员,154
multiple developers per service, 154
parallel work candidates, 212–213
项目复杂性和322
project complexity and, 322
project design in action, 278–280
高级开发人员担任初级架构师,376
senior developers as junior architects, 376
splitting activities, 211, 280
TradeMe project design case study, 346–347
工作和成本,213
work and cost, 213
帕金森定律
Parkinson’s law
风险增加,并且239
increased risk, and, 239
高估,158
overestimation and, 158
成功概率,以及159
probability of success, and, 159
时间紧迫,6
time crunch, 6
Parnas, David。请参阅 《关于将系统分解为模块时应使用的标准》(Parnas)
Parnas, David. See On the Criteria to Be Used in Decomposing Systems into Modules (Parnas)
同行评审的重要性,148,210,381
Peer reviews, importance of, 148, 210, 381
PERT (Program Evaluation and Review Technique), 162, 422
阶段、项目
Phases, project
项目设计实际操作264
project design in action, 264
TradeMe 项目设计案例研究,342
TradeMe project design case study, 342
Phi. See also Golden ratio, 244–245, 317
规划假设
Planning assumptions
日历日期和176
calendar dates and, 176
汇报时,378
in debriefing, 378
in design of project design, 271–272
设计标准428
in design standard, 428
in project design in action, 263–265
在 TradeMe 项目设计案例研究中,341
in TradeMe project design case study, 341
计划/规划。另请参阅 挣值规划;SDP(软件开发计划)评审
Plans/planning. See also Earned value planning; SDP (Software Development Plan) review
拥有多个的好处,152
benefits of having multiple, 152
继续留任的重要性,399
importance of staying on, 399
STP(服务测试计划),389
STP (service test plan), 389
练习
Practicing
identifying areas of volatility, 34–36, 52–53
Presentation layer, architecture, 60–61
流程负责人。参见 建筑师
Process lead. See Architects
产品经理,职位描述,150
Product manager, job description, 150
Program Evaluation and Review Technique (PERT), 162, 422
Progress. See Project progress; Project tracking; Reporting progress
项目复杂性
Project complexity
圈复杂度。请 参阅圈复杂度
cyclomatic. See Cyclomatic complexity
reducing for project design in action, 259, 267
reducing for TradeMe project, 340–341
用方法减少,76
reducing with The Method, 76
Project costs. See also Compression, of schedule; Effort estimations
decompression targets relation to, 251–252
财务分析,364
financial analysis, 364
importance of educated decisions, 151–152
importance of having multiple plans, 152–153
modular system design and, 409–411
项目设计实际操作中,265、274、278、283、285 – 286
in project design in action, 265, 274, 278, 283, 285–286
项目效率,186
project efficiency, 186
staffing distribution chart and calculating, 184–185
total, direct and indirect costs, 222–224
在TradeMe项目设计案例研究中,345 – 346、349 – 350、354、358 – 359
in TradeMe project design case study, 345–346, 349–350, 354, 358–359
项目设计
Project design
choosing the normal solution, 275–276
communicating with developers/managers, 8–9
compression, throughput analysis, 287–289
creating a design for the project design, 370–372
关键路径分析。请参阅 关键路径分析
critical path analysis. See Critical path analysis
死亡地带,218 – 220,223,292 – 293
the death zone, 218–220, 223, 292–293
earned value planning, 187–194
工作量估算。请参阅 工作量估算
effort estimations. See Effort estimations
一般准则,365
general guidelines, 365
hierarchy of needs pyramid, 141–144
importance of debriefing, 378–379
importance of practice, 377–378
introducing parallel work, 278–280
介绍,139
introduction to, 139
project cost/efficiency, 184–187
质量。参见 质量
quality. See Quality
风险。参见 风险
risk. See Risk
风险交叉点。参见 风险交叉点
risk crossover point. See Crossover point, of risk
角色和职责,194
roles and responsibilities, 194
scheduling activities, 176–183
services and developers, 153–157
standard design guidelines, 427–429
when to design a project, 361–363
项目设计实际操作
Project design, in action
compression using a second top developer, 276–278
图表,283
diagrams, 283
挣值规划,271
earned value planning in, 271
efficiency analysis, project design in action, 289–290
里程碑,262
milestones, 262
网络图,261 – 262,278,287
network diagrams, 261–262, 278, 287
离群值浮动,调整,295
outliers floats, adjusting, 295
planning assumptions in, 263–265
SDP (Software Development Plan) review, 303–304
人员配备,266
staffing, 266
time-cost curve, 290–292, 298–300
项目设计,TradeMe 案例研究
Project design, TradeMe case study
比较选项,355
comparing the options, 355
dependencies and project network, 339–341
individual activity estimations, 335–338
介绍,335
introduction to, 335
总体项目估算,338
overall project estimation, 338
规划假设,341
planning assumptions, 341
preparing for the SDP review, 359–360
项目工期
Project duration
accelerating software projects, 207–210
并行工作和成本,213
parallel work and cost, 213
进度压缩介绍,210
schedule compression introduction, 210
时间成本曲线。参见 时间成本曲线
time-cost curve. See Time-cost curve
working with better resources, 210–211
项目效率
Project efficiency
efficiency analysis, project design in action, 289–290
在TradeMe项目中,345 – 346、349 – 350、354
in TradeMe project, 345–346, 349–350, 354
项目交接。请参阅 交接、设计
Project hand-off. See Hand-off, design
项目生命周期,374
Project life cycles, 374
项目经理
Project managers
分配浮点数,204
assigning floats, 204
estimations and tracking, 160–161
职位描述概述,149
overview of job description, 149
项目指标的收集和分析382
Project metrics, collecting and analyzing, 382
项目网络
Project network
网络图。请参阅 网络图
network diagrams. See Network diagrams
项目效率和187
project efficiency and, 187
项目阶段,264
Project phases, 264
项目计划。请参阅 计划/规划
Project plans. See Plans/planning
项目进度。另请参阅 跟踪项目
Project progress. See also Tracking projects
combined with tracking effort, 395–396
进度和挣值,393
progress and earned value, 393
tracking progress and effort, 395–396
项目跟踪。请参阅 跟踪项目
Project tracking. See Tracking projects
预测。另请参阅 跟踪项目
Projections. See also Tracking projects
building trust with management, 404–405
overestimating and corrective actions, 402–403
项目跟踪和404
project tracking and, 404
resource leak and corrective actions, 401–402
继续执行计划,399
staying on the plan, 399
underestimating and corrective actions, 400–401
项目,大型。请参阅 大型项目
Projects, large. See Large projects
类似物业的运营、服务合同和421
Property-like operations, service contracts and, 421
前景理论(卡尼曼、特沃斯基),236
Prospect theory (Kahneman, Tversky), 236
Protocols, internal and external service protocols, 74–75
发布/订阅
Pub/Sub
in closed architecture, 78, 79
引言,46
introduction, 46
消息总线中,118
in message bus, 118
in project design in action, 267, 281
在实用程序栏中,65
in Utilities bar, 65
质量
Quality
accelerating the project through quality assurance, 207–208
大型项目失败,327
failure of large projects, 327
quality-assurance activities, 381–382
quality-control activities, 380–381
排队
Queueing
设计禁忌,80
design don’ts, 80
消息总线, 118
message bus, 118
比率
Ratio
developers to services, 153–155
直接成本与间接成本之比,229
direct cost to indirect cost, 229
金色,244
golden, 244
回归测试
Regression testing
功能系统无法实现,26
impossibility in functional system, 26
基础设施投资,208
with infrastructure investment, 208
质量控制活动,以及381
quality-control activities and, 381
与测试工程师,208
with test engineers, 208
基于波动性的分解,34
volatility-based decomposition, 34
相对临界性,可视化浮标,203
Relative criticality, visualizing floats, 203
Repeatability level, hierarchy of needs, 142–143, 209
报告进展
Reporting progress
integration versus features, 146–147
项目经理的职能,149
project manager’s function, 149
必需的行为。另请参阅 用例
Required behaviors. See also Use cases
business logic layer and, 61–62
分类指南, 65
classification guidelines, 65
rather than required functionality, 56–57
需求分析
Requirements analysis
changes in requirements, 83–85, 92–93
创建波动率列表,42
creating a volatilities list, 42
功能分解,14
functional decomposition, 14
functional trading system example, 27–30
solutions masquerading as requirements, 40–42
TradeMe system design case study, 112–116
volatility-based decomposition, 34–35
volatility-based example, 42–47
资源访问层
ResourceAccess layer
封闭式架构问题,77
closed architecture problems, 77
使用模拟器进行压缩,284
compression with simulators, 284
open architecture example, 75–76
relevant design questions, 66–67
重用增加自上而下,69
reuse increases top-down, 69
TradeMe 系统设计案例研究,115
TradeMe system design case study, 115
volatility decreases top-down, 68–69
资源
Resources
assigning services to developers, 153–157
使用顶级资源进行压缩,368
compressing with top resources, 368
核心团队。参见 核心团队
core team. See Core team
critical path analysis, 171–173
效率和人员弹性,186
efficiency and staffing elasticity, 186
floats-based scheduling, 205–206
leaks and corrective actions, 401–402
reasons for project design, 139–141
staffing and cost elements, 228–230
staffing distribution chart, 177–179
staffing distribution mistakes, 179–183
任务连续性,157
task continuity and, 157
TradeMe项目设计案例研究,341 – 342、345 – 346、348 – 350、351 – 352
TradeMe project design case study, 341–342, 345–346, 348–350, 351–352
低估和增加,401
underestimating and adding, 401
REST/WebAPI,微服务,74
REST/WebAPI, microservices, 74
投资回报率 (ROI)
Return on investment (ROI)
项目设计汇报,378
debriefing project design, 378
网络压缩, 231
network compression, 231
何时设计项目,361
when to design a project, 361
重复使用
Reuse
contracts as elements of, 414–415
factoring contracts and, 415–419, 422
自上而下增加,69
increases top down, 69
ResourceAccess 重用, 64
ResourceAccess reuse, 64
评论,系统级,381
Reviews, system level, 381
风险
Risk
活动风险。参见 活动风险
activity risk. See Activity risk
严重性风险。参见 严重性风险
criticality risk. See Criticality risk
交叉点。参见 风险交叉点
crossover point. See Crossover point, of risk
和直接成本,241
and direct cost, 241
执行风险,249
execution risk, 249
斐波那契风险。参见 斐波那契风险
Fibonacci risk. See Fibonacci risk
和浮点数,199 – 200 , 203 – 204 , 206 , 240 – 241
and floats, 199–200, 203–204, 206, 240–241
geometric activity risk, 317–318
geometric criticality risk, 316–317
几何斐波那契风险,317
geometric Fibonacci risk, 317
几何风险。参见 几何风险
geometric risk. See Geometric risk
importance of having an architect, 147–148
和间接成本,228
and indirect cost, 228
介绍,235
introduction to, 235
指标, 253
metrics, 253
规划和
planning and, 370
project design in action, 293–294, 302–303
reasons for project design, 139–141
TradeMe project design case study, 355–359
风险减压
Risk decompression
decompression target, 251–252, 313–315
如何,250
how to, 250
project design in action, 295–298
TradeMe project design case study, 355–359
投资回报率。参见 风险
ROI. See Risk
角色和阶段表
Roles and phases table
项目设计实际操作264
project design in action, 264
TradeMe 项目设计案例研究,342
TradeMe project design case study, 342
浅 S 曲线。请参见 浅 S 曲线
S curve, shallow. See Shallow S curve
安排活动。参见 进度压缩;关键路径分析;工作量估算;浮动时间;项目网络
Scheduling activities. See Compression, of schedule; Critical path analysis; Effort estimations; Floats; Project network
范围
Scope
projections and changes in, 404–405
SDP(软件开发计划)审查
SDP (Software Development Plan) review
成本要素, 229
cost elements, 229
概述, 153
overview of, 153
项目设计在行动中,262,303-304
project design in action, 262, 303–304
staffing distribution chart, 177–178
TradeMe 项目设计案例研究,341,359 – 360
TradeMe project design case study, 341, 359–360
安全
Security
在 TradeMe 系统设计案例研究中,115
in TradeMe system design case study, 115
在实用程序栏中,65
in Utilities bar, 65
volatility, trading system example, 42–43, 47
Semi-closed/open architecture, 76–77
高级开发人员
Senior developers
working with better resources, 210–211
序列图
Sequence diagrams
TradeMe 系统设计案例研究,133
TradeMe system design case study, 133
服务契约设计
Service contract design
设计挑战,423
design challenge, 423
设计标准指南,430
design standard guidelines, 430
factoring contracts introduced, 415–416
分解因数,419
factoring up, 419
介绍,412
introduction to, 412
services and contracts, 411–415
Service protocols, internal and external, 74–75
服务需求规范 (SRS), 389
Service requirement specification (SRS), 389
服务测试计划 (STP), 389
Service test plan (STP), 389
服务
Services
assigning developers to, 153–157
bloating and coupling, functional services, 17–19
business logic service, 117–119
clients: bloating and coupling, 15–17
development life cycle, 388–389
演员模型中的粒度, 121
granular in actor model, 121
服务级别测试, 380
service-level testing, 380
smallest set of components, 89–90
using services to cross layers, 59–60
浅 S 曲线。另请 参阅挣值规划
Shallow S curve. See also Earned value planning
项目设计实际操作中,266、271、274
in project design in action, 266, 271, 274
吞吐量分析, 287
throughput analysis, 287
跟踪项目进度, 395
tracking project progress, 395
在 TradeMe项目设计案例研究中,344、349、353
in TradeMe project design case study, 344, 349, 353
模拟器
Simulators
与正常解决方案相比,286
compared with the normal solution, 286
compression using the, 284–286
为神活动,308
for god activities, 308
删除依赖项, 212
removing dependencies, 212
simulators solution in project design in action, 284–286
Siren song of bad habits, 47–48
技能
Skills
identifying areas of volatility, 34–36, 52–53
improving development skills, 208–209
practicing project design, 377–378
烟雾测试,每日,380
Smoke tests, daily, 380
软件架构师。参见 架构师
Software architect. See Architects
软件开发计划审查。请参阅 SDP(软件开发计划)审查
Software Development Plan review. See SDP (Software Development Plan) review
Splitting activities, 211, 280
SRS(服务需求规范),389
SRS (service requirement specification), 389
人员配备。另请参阅 资源
Staffing. See also Resources
calculating project cost, 184–185
distribution mistakes, 179–183
弹性,186
elasticity of, 186
initial staffing in project design, 147–151
planning requirements, 263–265
项目设计在行动中,266,285-286
project design in action, 266, 285–286
平稳的人员分配,183
smooth staffing distribution, 183
TradeMe项目设计案例研究,341 – 342、345 – 346、348 – 350、351 – 352
TradeMe project design case study, 341–342, 345–346, 348–350, 351–352
标准操作程序(SOP),质量保证,382
Standard operating procedures (SOPs), quality assurance, 382
标准
Standards
供设计参考。请参阅 设计标准
for design. See Design standard
静态架构
Static architecture
project design in action, 256–257
TradeMe 系统设计案例研究,116
TradeMe system design case study, 116
项目状态。请参阅 项目进展
Status, of projects. See Project progress
状态报告。请参阅 报告进度
Status reports. See Reporting progress
Storage volatility, trading system example, 43, 46
STP(服务测试计划),389
STP (service test plan), 389
Structural activities, TradeMe project, 336–337
Structural dependencies, abstract. See also Activity dependencies, 339
结构
Structure
classification guidelines, 65–70
client and business logic layers, 60–63
介绍,55
introduction to, 55
open and closed architectures, 75–79
subsystems and services, 70–75
typical layers in The Method, 60–61
use cases and requirements, 56–58
亚临界人员配置
Subcritical staffing
与负荷均衡相比,183
compared with load leveling, 183
引言,172
introduction, 172
项目复杂性和322
project complexity and, 322
project design in action, 272–274
在人员分布图中,180
in staffing distribution chart, 180
TradeMe project design case study, 353–354
子系统
Subsystems
成功
Success
probability as a function of estimation, 158–159
泳道、活动图、105、112、124 – 130
Swim lanes, activity diagrams, 105, 112, 124–130
System design. See also Layered design; Requirements analysis
architecture versus estimations, 365–366
可扩展性的好处,72
benefits of extensibility, 72
构图。参见 构图
composition. See Composition
分解。参见 分解
decomposition. See Decomposition
design standard guidelines, 426–427
introduction to The Method, 4–5
modular system design, 409–411
结构。参见 结构
structure. See Structure
系统设计,TradeMe 案例研究
System design, TradeMe case study
设计验证, 124
design validation, 124
词汇表:四个经典问题,112
glossary: four classic questions, 112
identifying areas of volatility, 112–116
介绍,95
introduction to, 95
legacy system: use cases, 100–104
Message Is (application), 120–121
新系统/公司,99
new system/the company, 99
系统级评审、质量控制活动以及381
System-level reviews, quality-control activities and, 381
目标,减压
Target, decompression
项目设计实践,推荐,301
project design in action, recommended, 301
TradeMe project and finding the, 358–359
任务连续性,分配资源,157
Task continuity, assigning resources, 157
团队。另请参阅 核心团队
Team. See also Core team
汇报,379
debriefing, 379
design and team efficiency, 155–157
技术经理。参见 建筑师
Technical manager. See Architects
技术,培训开发人员使用新技术,208 – 209,381 – 382
Technology, training developers in new technology, 208–209, 381–382
测试工程师
Test engineers
加速该项目,208
accelerating the project, 208
质量控制活动
in quality control activities, 380
测试。另请参阅 “质量”
Testing. See also Quality
functional and domain decomposition, 25–26
quality-control activities, 380–381
软件测试人员,208
software testers, 208
STP(服务测试计划),389
STP (service test plan), 389
系统测试,380
system testing, 380
基于波动性的分解,34
volatility-based decomposition, 34
方法。参见 方法
The Method. See Method
热力学定律,20
Thermodynamics, law of, 20
Throughput analysis, compression and, 287–289
时间成本曲线
Time-cost curve
avoiding classic mistakes, 217–218
离散建模,217
discrete modeling, 217
finding normal solutions, 220–221
first actual time-cost curve, 225–226
介绍,214
introduction to, 214
project design in action, 290–292, 298–300
时间紧迫,好处多多,6
Time crunch, benefits of a, 6
时间、项目。请参阅 项目持续时间
Time, project. See Project duration
时间一风险曲线
Time-risk curve
actual time-risk curve, 237–239
项目设计在行动中,295 – 298,300 – 301,303
project design in action, 295–298, 300–301, 303
TradeMe project design case study, 356–358
Timeline, subsystems and, 373–374
工具,估计,163
Tools, estimation, 163
总成本
Total cost
与间接成本相比,230
versus indirect cost, 230
normal solution and minimum, 225–228
total, direct and indirect costs, 223–224
总浮动。另请参阅 浮动
Total float. See also Floats
主动项目管理,204
proactive project management, 204
跟踪项目
Tracking projects
累积努力,394
accumulated effort, 394
accumulated indirect cost, 394–396
activity life cycle and status introduction, 388–389
effort estimations and, 160–161
overestimating and corrective actions, 402–403
项目进度和挣值,392
project progress and earned value, 392
projections and corrective actions, 398–399
projections introduction, 396–398
resource leak and corrective actions, 401–402
标准设计指南,429
standard design guidelines, 429
tracking progress and effort, 395–396
underestimating and corrective actions, 400–401
TradeMe 案例研究。参见 项目设计、TradeMe 案例研究;系统设计、TradeMe 案例研究
TradeMe case study. See Project design, TradeMe case study; System design, TradeMe case study
TradeMe 项目设计案例研究。查看 项目设计、TradeMe 案例研究
TradeMe project design case study. See Project design, TradeMe case study
培训开发人员,使用新技术,208 – 209,381 – 382
Training developers, in new technology, 208–209, 381–382
透明度、沟通和8
Transparency, communication and, 8
趋势线
Trend lines
项目设计在行动中,时间成本,291
project design in action, time cost, 291
项目设计在行动中,时间风险,300
project design in action, time risk, 300
TradeMe 项目设计案例研究,时间风险,357
TradeMe project design case study, time risk, 357
UI(用户界面)开发、任务连续性,157
UI (user interface) development, task continuity, 157
UML 活动图。请参阅 活动图
UML activity diagram. See Activity diagrams
UML 序列图。请参阅 序列图
UML sequence diagram. See Sequence diagrams
低估项目
Underestimating projects
不经济区,时间成本曲线,216
Uneconomical zone, time-cost curve, 216
Unit testing, decomposition, 25–26
普遍原则
Universal principles
迭代设计,增量构建,71
design iteratively, build incrementally, 71
特征作为整合的方面,91
features as aspects of integration, 91
切勿违背需求进行设计,84
Never design against the requirements, 84
volatility-based decomposition, 32–33
使用案例
Use cases
call chain diagrams and, 87–88
提取依赖项,339
distilling dependencies from, 339
泳道介绍,105
introduction to swim lanes, 105
smallest set of components, 86–87, 89–90
TradeMe 系统设计案例研究,100 – 105,124 – 135
TradeMe system design case study, 100–105, 124–135
用户界面(UI)开发、任务连续性,157
User interface (UI) development, task continuity, 157
实用工具
Utilities
引言,65
introduction, 65
problems in a closed architecture, 77–78
TradeMe 系统设计案例研究,115
TradeMe system design case study, 115
Validation. See also Call chain diagrams; Sequence diagrams
建筑,87
architecture, 87
TradeMe 系统设计案例研究,124
TradeMe system design case study, 124
价值观、行为与370
Values, behavior vs., 370
可变的,挥发性的,37
Variable, volatile vs., 37
Vision,TradeMe 系统设计案例研究,109 – 110
Vision, TradeMe system design case study, 109–110
基于波动性的分解
Volatility-based decomposition
compared with functional decomposition, 32–33
definition and benefits of, 30–32
design for your competitors, 50–51
设计标准426
in design standard, 426
house (axes of volatility) example, 39–40
solutions masquerading as requirements, 40–42
测试,34
testing, 34
TradeMe 系统设计案例研究,112 – 116,122 – 123
TradeMe system design case study, 112–116, 122–123
挥发性与可变性,37
volatile versus variable, 37
volatility and the business, 48–50
volatility-based trading system example, 42–47
冯·毛奇,赫尔穆特。参见 跟踪项目
Von Moltke, Helmuth. See Tracking projects
WCF(Windows 通信基础),73
WCF (Windows Communication Foundation), 73
Wideband Delphi estimation technique, 163–164
Windows 通信基础 (WCF),73
Windows Communication Foundation (WCF), 73
工作日,日历日期至,176
Workdays, calendar dates to, 176
工作流管理器
Workflow Manager
选择工作流程工具,123
choosing a workflow tool, 123
定义,122
definition, 122
许多书目都包含编程代码或配置示例。为了优化这些元素的呈现效果,请以单列、横向模式查看电子书,并将字体大小调整为最小设置。除了以可重排文本格式呈现代码和配置外,我们还提供了模仿印刷书中呈现的代码图像;因此,如果可重排格式可能会影响代码列表的呈现效果,您将看到“单击此处查看代码图像”链接。单击该链接可查看打印保真度代码图像。要返回上一页,请单击设备或应用程序上的“返回”按钮。
Many titles include programming code or configuration examples. To optimize the presentation of these elements, view the eBook in single-column, landscape mode and adjust the font size to the smallest setting. In addition to presenting code and configurations in the reflowable text format, we have included images of the code that mimic the presentation found in the print book; therefore, where the reflowable format may compromise the presentation of the code listing, you will see a “Click here to view code image” link. Click the link to view the print-fidelity code image. To return to the previous page viewed, click the Back button on your device or app.